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I, L. MICHAEL FURNESS, a citizen of the United Kingdom, residing at 2 Brookside, 
Exning, Newmarket, United Kingdom, declare that: 

1. I was employed by Incyte Genomics, Inc. (hereinafter "Incyte") as a Director 
of Pharmacogenomics until December 31, 2001. I am currently under contract to be a Consultant to 
Incyte Genomics, Inc. 



2. In 1984, I received a B.Sc.(Hons) in Biomolecular Science (Biophysics and 
Biochemistry) from Portsmouth Polytechnic. 

From 1985-1987 I was at the School of Pharmacy in London, United Kingdom, during 
which time I analyzed lipid methyltransferase enzymes using a variety of protein analysis methods, 
including one-dimensional (ID) and two-dimensional (2D) gel electrophoresis, HPLC, and a variety of 
enzymatic assay systems. 
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I then worked in the Protein Structure group at the National Institute for Medical 
Research until 1989, setting up core facilities for nucleic acid synthesis and sequencing, as well as 
assisting in programs on protein kinase C inhibitors. 

After a year at Perkin Elmer-Applied Biosystems as a technical specialist, I worked at 
the Imperial Cancer Research Fund between 1990-1992, on a Eureka-funded program collaborating 
with Amersham Pharmacia in the United Kingdom and CEPH (Centre d'Etude du Polymorphisme 
Humaine) in Paris, France, to develop novel nucleic acid purification and characterization methods. 

In 1992, 1 moved to Pfizer Central Research in the United Kingdom, where I stayed 
until 1998, initially setting up core DNA sequencing and then a DNA arraying facility for gene 
expression analysis in 1993. My work also included bioinformatics and I was responsible for the 
support of all Pfizer neuroscience programs in the United Kingdom. This then led me into carrying out 
detailed bioinformatics and wet lab work on the sodium channels, including antibody generation, 
Western and Northern analyses, PCR, tissue distribution studies, and sequence analyses on novel 
sequences identified. 

In 1998, 1 moved to Incyte Genomics, Inc., to the Pharmacogenomics group to look at 
the application of genomics and proteomics to the pharmaceutical industry. In 1999, 1 was appointed 
director of the LifeExpress Lead Program which used microarray and protein expression data to 
identify pharmacologically and toxicologically relevant mechanisms to assist in improved drug design 
and development. 

On December 12, 2001 1 founded Nuomics Consulting Ltd., in Exning, U.K., and I am 
currently employed as Managing Director. Nuomics Consulting Ltd. will be providing expert technical 
knowledge and advice to businesses around the areas of genomics, proteomics, pharmacogenomics, 
toxicogenomics and chemogenomics. 

3. I have reviewed the specification of a United States patent application that I 
understand was filed on December 11, 2001 in the names of Yue et al. and was assigned Serial No. 
10/018,170 (hereinafter "the Yue '170 application"). Furthermore, I understand that this United States 
patent application claimed priority to United States Provisional Patent Application Serial No. 
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60/139,566 filed on June 16, 1999 (hereinafter "the Yue '566 application"). The SEQ ID N0.12- 
encoding polynucleotides were described in the Yue '566 application. (Note that the sequences of 
SEQ ID NO: 12 and SEQ ID NO:64 disclosed in the Yue '170 application are identical to the 
sequences referred to as SEQ ID NO: 12 and SEQ ED NO:31, respectively, in the Yue '566 
application). My remarks herein will therefore be directed to the Yue '566 patent application, and June 
16, 1999, as the relevant date of filing. In broad overview, the Yue '566 specification pertains to 
certain nucleotide and amino acid sequences and their use in a number of applications, including gene 
and protein expression monitoring applications that are useful in connection with (a) developing drugs 
(e.g., for the treatment of cancer), and (b) monitoring the activity of drugs for purposes relating to 
evaluating their efficacy and toxicity. 

4. I understand that (a) the Yue '170 application contains claims that are directed 
to a substantially purified polypeptide having the sequence disclosed in the Yue '170 application as 
SEQ ID NO: 12 (hereinafter "the SEQ ID NO: 12 polypeptide"), and (b) the Patent Examiner has 
rejected those claims on the grounds that the specification of the Yue '170 application does not disclose 
a substantial, specific and credible utility for the claimed SEQ ID NO: 12 polypeptide. I further 
understand that whether or not a patent specification discloses a substantial, specific and credible utility 
for its claimed subject matter is properly determined from the perspective of a person skilled in the art 
to which the specification pertains at the time of the patent application was filed. In addition, I 
understand that a substantial, specific and credible utility under the patent laws must be a "real-world" 
utility. 

5. I have been asked (a) to consider with a view to reaching a conclusion (or 
conclusions) as to whether or not I agree with the Patent Examiner's position that the Yue '170 
application and its parent, the Yue '566 application, does not disclose a substantial, specific and 
credible "real-world" utility for the claimed SEQ ID NO: 12 polypeptide, and (b) to state and explain 
the bases for any conclusions I reach. I have been informed that, in connection with my considerations, 
I should determine whether or not a person skilled in the art to which the Yue '566 application pertains 
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on June 16, 1999, would have concluded that the '566 application disclosed, for the benefit of the 

public, a specific beneficial use of the SEQ ID NO: 12 polypeptide in its then available and disclosed 

form. I have also been informed that, with respect to the "real-world" utility requirement, the Patent 

and Trademark Office instructs its Patent Examiners in Section 2107 of the Manual of Patent Examining 

Procedure, under the heading "I. 'Real-World Value' Requirement": 

"Many research tools such as gas chromatographs, screening assays, and 
nucleotide sequencing techniques have a clear, specific and unquestionable utility (e.g., 
they are useful in analyzing compounds). An assessment that focuses on whether an 
invention is useful only in a research setting thus does not address whether the specific 
invention is in fact 'useful' in a patent sense. Instead, Office personnel must distinguish 
between inventions that have a specifically identified utility and inventions whose 
specific utility requires further research to identify or reasonably confirm." 

6. I have considered the matters set forth in paragraph 5 of this Declaration and 
have concluded that, contrary to the position I understand the Patent Examiner has taken, the 
specification of the Yue '566 patent application disclosed to a person skilled in the art at the time of its 
filing a number of substantial, specific and credible real-world utilities for the claimed SEQ ID NO: 12 
polypeptide. More specifically, persons skilled in the art on June 16, 1999 would have understood the 
Yue '566 application to disclose the use of the SEQ ID NO: 12 polypeptide as a research tool in a 
number of gene and protein expression monitoring applications that were well-known at that time to be 
useful in connection with the development of drugs and the monitoring of the activity of such drugs. I 
explain the bases for reaching my conclusion in this regard in paragraphs 7-13 below. 

7. In reaching the conclusion stated in paragraph 6 of this Declaration, I 
considered (a) the specification of the Yue '566 application, and (b) a number of published articles and 
patent documents that evidence gene and protein expression monitoring techniques that were 
well-known before the June 16, 1999 filing date of the Yue '566 application. The published articles 
and patent documents I considered are: 
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(a) Anderson, N.L., Esquer-Blasco, R., Hofmann, J.-P., Anderson, N.G., 
A Two-Dimensional Gel Database of Rat Liver Proteins Useful in Gene Regulation and Drug Effects 
Studies , Electrophoresis, 12, 907-930 (1991) (hereinafter "the Anderson 1991 article") (copy annexed 
at Tab A); 

(b) Anderson, N.L., Esquer-Blasco, R., Hofmann, J.-P., Mehues, L., 
Raymackers, J., Steiner, S. Witzmann, R, Anderson, N.G., An Updated Two-Dimensional Gel 
Database of Rat Liver Proteins Useful in Gene Regulation and Drug Effect Studies . Electrophoresis, 16, 
1977-1981 (1995) (hereinafter "the Anderson 1995 article") (copy annexed at Tab B); 

(c) Wilkins, M.R., Sanchez, J.-C, Gooley, A.A., Appel, R.D., 
Humphery-Smith, L, Hochstrasser, D.F., Williams, K.L., Progress with Proteome Projects: Why all 
Proteins Expressed by a Genome Should be Identified and How To Do It , Biotechnology and Genetic 
Engineering Reviews, 13, 19-50 (1995) (hereinafter "the Wilkins article") (copy annexed at Tab C); 

(d) Celis, J.E., Rasmussen, H.H., Leffers, H., Madsen, P., Honore, B., 
Gesser, B., Dejgaard, K., Vandekerckhove, J., Human Cellular Protein Patterns and their Link to 
Genome DNA Sequence Data: Usefulness of Two-Dimensional Gel Electrophoresis and 
Microsequencing, FASEB Journal, 5, 2200-2208 (1991) (hereinafter "the Celis article") (copy 
annexed at Tab D); 

(e) Franzen, B., Linder, S., Okuzawa, K., Kato, H., Auer, G., 
Nonenzymatic Extraction of Cells from Clinical Tumor Material for Analysis of Gene Expression by 
Two-Dimensional Polvacrvlamide Gel Electrophoresis , Electrophoresis, 14, 1045-1053 (1993) 
(hereinafter "the Franzen article") (copy annexed at Tab E); 

(f) Bjellqvist, B., Basse, B., Olsen, E., Celis, J.E., Reference Points for 
Comparisons of Two-Dimensional Maps of Proteins from Different Human Cell Types Defined in a pH 
Scale Where Isoelectric Points Correlate with Polypeptide Compositions , Electrophoresis, 15, 529- 
539 (1994) (hereinafter "the Bjellqvist article") (copy annexed at Tab F); 

(g) Large Scale Biology Company Info; LSB and LSP Information; from 
http://www.lsbc.com (2001) (copy annexed at Tab G); 
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8. Many of the published articles I considered (i.e., at least items (a)-(f) identified 
in paragraph 7) relate to the development of protein two-dimensional gel electrophoretic techniques for 
use in protein expression monitoring applications in drug development and toxicology. As I will discuss 
below, a person skilled in the art who read the Yue '566 application on June 16, 1999 would have 
understood that application to disclose the SEQ ID NO: 12 polypeptide to be useful for a number of 
protein expression monitoring applications, e.g., in the use of two-dimensional polyacrylamide gel 
electrophoresis and western blot analysis of tissue samples in drug development and in toxicity testing. 

Furthermore, items (a)-(f) establish that protein two-dimensional polyacrylamide gel 
electrophoresis and western blot analysis were well-known and established methods routinely used in 
toxicology testing and drug development at the time of filing the Yue '566 application and for several 
years prior to June 16, 1999. As such, one of ordinary skill in the art would have recognized that the 
polypeptide of SEQ ID NO: 12 could be used in toxicology testing and drug development, irrespective 
of its biochemical activities. 

9. Turning more specifically to the Yue '566 specification, the SEQ ID NO: 12 
polypeptide is shown at pages 14-15 as one of 38 sequences under the heading "Sequence Listing." 
The Yue '566 specification specifically teaches that the invention features a "substantially purified" 
polypeptide, human annexin (INSIG-12) having the amino acid sequence shown in SEQ ID NO: 12" 
(Yue '566 application at p. 3 and Table 2). It further teaches that (a) the identity of the SEQ ID 
NO: 12 polypeptide was determined from a diseased gallbladder tissue cDNA library (GBLANOT01) 
(Yue '566 application, Table 4), (b) the SEQ ID NO: 12 polypeptide is the annexin referred to as 
"INSIG-12" and is encoded by SEQ ID NO:64, and (c) northern analysis shows that "INSIG-12 is 
expressed predominantly in cDNA libraries associated with reproductive, gastrointestinal, and nervous 
system tissues and in tissues associated with cancer and inflammation (Yue '566 application at Table 
3). 

The Yue '566 application discusses a number of uses of the SEQ ID NO: 12 
polypeptide in addition to its use in protein expression monitoring applications. I have not fully 
evaluated these additional uses in connection with the preparation of this Declaration and do not 
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express any views in this Declaration regarding whether or not the Yue '566 specification discloses 
these additional uses to be substantial, specific and credible real-world utilities of the SEQ ID NO: 12 
polypeptide. Consequently, my discussion in this Declaration concerning the Yue '566 application 
focuses on the portions of the application that relate to the use of the SEQ ID NO: 12 polypeptide in 
gene and protein expression monitoring applications. 

10. The Yue '566 application discloses that the polynucleotide sequences disclosed 
therein, including the polynucleotides encoding the SEQ ID NO: 12 polypeptide, are useful as probes in 
chip based technologies. It further teaches that the chip based technologies can be used "for the 
detection and/or quantification of nucleic acid or protein" (Yue '566 application at p. 20, lines 17-20). 

The Yue '566 application also discloses that the SEQ ID NO: 12 polypeptide is useful 
in other protein expression detection technologies. The Yue '566 application states that "[a] methods 
for detecting and measuring the expression of INSIG using either specific polyclonal or monoclonal 
antibodies are known in the art. Examples of such techniques include enzyme-linked immunosorbent 
assays (ELISAs), radioimmunoassays (RIAs), and fluorescence activated cell sorting (FACS)" (Yue 
'566 application at p. 20, lines 21-24). Furthermore, the Yue '566 application discloses that "[a] 
variety of protocols for measuring INSIG, including ELISAs, RIAs, and FACS, are known in the art 
and provide a basis for diagnosing altered or abnormal levels of INSIG expression. Normal or 
standard values for INSIG expression are established by combining body fluids or cell extracts taken 
from normal mammalian subjects, preferably human, with antibody to INSIG under conditions suitable 
for complex formation" (Yue '566 application at p. 31, lines 2-6). 

In addition, at the time of filing the Yue '566 application, it was well known in the art 
that "gene" and protein expression analyses also included two-dimensional polyacrylamide gel 
electrophoresis (2-D PAGE) technologies, which were developed during the 1980s, and as exemplified 
by the Anderson 1991 and 1995 articles (Tab A and Tab B). The Anderson 1991 article teaches that 
a 2-D PAGE map has been used to connect and compare hundreds of 2-D gels of rat liver samples 
from a variety of studies including regulation of protein expression by various drugs and toxic agents 
(Tab A at p. 907). The Anderson 1991 article teaches an empirically-determined standard curve fitted 
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to a series of identified proteins based upon amino acid chain length (Tab A at p. 91 1) and how that 
standard curve can be used in protein expression analysis. The Anderson 1991 article teaches that 
"there is a long-term need for a comprehensive database of liver proteins" (Tab A at p. 912). 

The Wilkins article is one of a number of documents that were published prior to the 
June 16, 1999 filing date of the Yue '566 application that describes the use of the 2-D PAGE 
technology in a wide range of gene and protein expression monitoring applications, including monitoring 
and analyzing protein expression patterns in human cancer, human serum plasma proteins, and in rodent 
liver following exposure to toxins. In view of the Yue '566 application, the Wilkins article, and other 
related pre- June 1999 publications, persons skilled in the art on June 16, 1999 clearly would have 
understood the Yue '566 application to disclose the SEQ ID NO: 12 polypeptide to be useful in 2-D 
PAGE analyses for the development of new drugs and monitoring the activities of drugs for such 
purposes as evaluating their efficacy and toxicity, as explained more fully in paragraph 12 below. 

With specific reference to toxicity evaluations, those of skill in the art who were 
working on drug development in June 1999 (and for many years prior to June 1999) without any doubt 
appreciated that the toxicity (or lack of toxicity) of any proposed drug they were working on was one 
of the most important criteria to be considered and evaluated in connection with the development of the 
drug. They would have understood at that time that good drugs are not only potent, they are specific. 
This means that they have strong effects on a specific biological target and minimal effects on all other 
biological targets. Ascertaining that a candidate drug affects its intended target, and identification of 
undesirable secondary effects (i.e., toxic side effects), had been for many years among the main 
challenges in developing new drugs. The ability to determine which genes are positively affected by a 
given drug, coupled with the ability to quickly and at the earliest time possible in the drug development 
process identify drugs that are likely to be toxic because of their undesirable secondary effects, have 
enormous value in improving the efficiency of the drug discovery process, and are an important and 
essential part of the development of any new drug. In fact, the desire to identify and understand 
toxicological effects using the experimental assays described above led Dr Leigh Anderson to found the 
Large Scale Biology Corporation in 1985, in order to pursue commercial development of the 2-D 
electrophoretic protein mapping technology he had developed. In addition, the company focused on 
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toxicological effects on the proteome as clearly demonstrated by its goals and by its senior management 
credentials described in company documents (see Tab G at pp. 1, 3, and 5). 

Accordingly, the teachings in the Yue '566 application, in particular regarding use of 
SEQ ID NO: 12 in differential gene and protein expression analysis (2-D PAGE maps) and in the 
development and the monitoring of the activities of drugs, clearly includes toxicity studies and persons 
skilled in the art who read the Yue '566 application on June 16, 1999 would have understood that to 
be so. 

11. As previously discussed (supra, paragraphs 7 and 8), my experience with 
protein analysis methods in the mid-1980s and the several publications annexed to this Declaration at 
Tabs A through F evidence information that was available to the public regarding two-dimensional 
polyacrylamide gel electrophoresis technology and its uses in drug discovery and toxicology testing 
before the June 16, 1999 filing date of the Yue '566 application. In particular the Celis article stated 

that "protein databases are expected to foster a variety of biological information.... — among others, 

drug development and testing" (See Tab D, p. 2200, second column). The Franzen article shows that 
2-D PAGE maps were used to identify proteins in clinical tumor material (See Tab E). The Yue '566 
application clearly discloses that expression of INSIG is associated with reproductive, gastrointestinal, 
and nervous system tissues and tissues associated with cancer and inflammation. (Yue '566 application 
at Table 3). The Bjellqvist article showed that a protein may be identified accurately by its positional 
co-ordinates, namely molecular mass and isoelectric point (See Tab F). The Yue '566 application 
clearly disclosed SEQ ID NO: 12 from which it would have been routine for one of skill in the art to 
predict both the molecular mass and the isoelectric point using algorithms well known in the art at the 
time of filing. 

12. A person skilled in the art on June 16, 1999, who read the Yue '566 
application, would understand that application to disclose the SEQ ID NO: 12 polypeptide to be highly 
useful in analysis of differential expression of proteins. For example, the specification of the Yue '566 
application would have led a person skilled in the art in June 1999 who was using protein expression 
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monitoring in connection with working on developing new drugs for the treatment of an cancer, immune 
disorders, neurological disorders, and gastrointestinal disorders to conclude that a 2-D PAGE map that 
used the substantially purified SEQ ID NO: 12 polypeptide would be a highly useful tool and to request 
specifically that any 2-D PAGE map that was being used for such purposes utilize the SEQ ID NO: 12 
polypeptide sequence. Expressed proteins are useful for 2-D PAGE analysis in toxicology expression 
studies for a variety of reasons, particularly for purposes relating to providing controls for the 2-D 
PAGE analysis, and for identifying sequence or post-translational variants of the expressed sequences 
in response to exogenous compounds. Persons skilled in the art would appreciate that a 2-D PAGE 
map that utilized the SEQ ID NO: 12 polypeptide sequence would be a more useful tool than a 2-D 
PAGE map that did not utilize this protein sequence in connection with conducting protein expression 
monitoring studies on proposed (or actual) drugs for treating cancer, immune disorders, neurological 
disorders, and gastrointestinal disorders for such purposes as evaluating their efficacy and toxicity. 

I discuss in more detail in items (a)-(b) below a number of reasons why a person skilled 
in the art, who read the Yue '566 specification in June 1999, would have concluded based on that 
specification and the state of the art at that time, that SEQ ID NO: 12 polypeptide would be a highly 
useful tool for analysis of a 2-D PAGE map for evaluating the efficacy and toxicity of proposed drugs 
for cancer, immune disorders, neurological disorders, and gastrointestinal disorders by means of 2-D 
PAGE maps, as well as for other evaluations: 

(a) The Yue '566 specification contains a number of teachings that would lead 
persons skilled in the art on June 16, 1999 to conclude that a 2-D PAGE map that utilized the 
substantially purified SEQ U) NO: 12 polypeptide would be a more useful tool for protein expression 
monitoring applications relating to drugs for treating cancer, immune disorders, neurological disorders, 
and gastrointestinal disorders than a 2-D PAGE map that did not use the SEQ ID NO: 12 polypeptide 
sequence. Among other things, the Yue '566 specification teaches that (i) the identity of the SEQ ID 
NO: 12 polypeptide was determined from a "diseased gallbladder" tissue cDNA library 
(GBLANOT01) (Yue '566 application, Table 4), (ii) the SEQ ID NO: 12 polypeptide is the annexin 
referred to as INSIG-12, and (iii) INSIG-12 is expressed in various libraries derived from 
reproductive, gastrointestinal, and nervous system tissues and in tissues associated with cancer and 
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inflammation (Yue '566 application at Table 3), and therefore "INSIG appears to play a role in cancer, 
immune disorders, neurological disorders, and gastrointestinal disorders" (Yue '566 application at p. 
22; see paragraph 9, supra). The substantially purified polypeptide could therefore be used as a 
control to more accurately gauge the expression of INSIG in the sample and consequently more 
accurately gauge the affect of a toxicant on expression of the gene. 

(b) Persons skilled in the art on June 16, 1999 would have appreciated (i) that 
the protein expression monitoring results obtained using a 2-D PAGE map that utilized a SEQ ID 
NO: 12 polypeptide would vary, depending on the particular drug being evaluated, and (ii) that such 
varying results would occur both with respect to the results obtained from the SEQ ID NO: 12 
polypeptide and from the 2-D PAGE map as a whole (including all its other individual proteins). These 
kinds of varying results, depending on the identity of the drug being tested, in no way detracts from my 
conclusion that persons skilled in the art on June 16, 1999, having read the Yue '566 specification, 
would specifically request that any 2-D PAGE map that was being used for conducting protein 
expression monitoring studies on drugs for treating cancer, immune disorders, neurological disorders, 
and gastrointestinal disorders {e.g., a toxicology study or any efficacy study of the type that typically 
takes place in connection with the development of a drug) utilize the SEQ ID NO: 12 polypeptide 
sequence. Persons skilled in the art on June 16, 1999 would have wanted their 2-D PAGE map to 
utilize the SEQ ID NO: 12 polypeptide sequence because a 2-D PAGE map that utilized protein 
sequence information the polypeptide (as compared to one that did not) would provide more useful 
results in the kind of protein expression monitoring studies using 2-D PAGE maps that persons skilled 
in the art have been doing since well prior to June 16, 1999. 

The foregoing is not intended to be an all-inclusive explanation of all my reasons for 
reaching the conclusions stated in this paragraph 12, and in paragraph 6, supra. In my view, however, 
it provides more than sufficient reasons to justify my conclusions stated in paragraph 6 of this 
Declaration regarding the Yue '566 application disclosing to persons skilled in the art at the time of its 
filing substantial, specific and credible real-world utilities for the SEQ ID NO: 12 polypeptide. 
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13. Also pertinent to my considerations underlying this Declaration is the fact that 
the Yue '566 disclosure regarding the uses of the SEQ ID NO: 12 polypeptide for protein expression 
monitoring applications is not limited to the use of that protein in 2-D PAGE maps. For one thing, the 
Yue '566 disclosure regarding the technique used in gene and protein expression monitoring 
applications is broad (Yue '566 application at, e.g., p. 20, lines 16-20 and p. 31, lines 2-9). 

In addition, the Yue '566 specification repeatedly teaches that the protein described 
therein (including the SEQ ID NO: 12 polypeptide) may desirably be used in any of a number of long 
established "standard" techniques, such as ELISA or western blot analysis, for conducting protein 
expression monitoring studies. See, e.g.: 

(a) Yue '566 application at p. 20, lines 21-24 ("Immunological methods for 
detecting and measuring the expression of INSIG using either specific polyclonal or monoclonal 
antibodies are known in the art. Examples of such techniques include enzyme-linked immunosorbent 
assays (ELJSAs), radioimmunoassays (RIAs), and fluorescence activated cell sorting (FACS)"); 

(b) Yue '566 application at p. 31, lines 2-9 ("A variety of protocols for 
measuring INSIG, including ELISAs, RIAs, and FACS, are known in the art and provide a basis for 
diagnosing altered or abnormal levels of INSIG expression. Normal or standard values for INSIG 
expression are established by combining body fluids or cell extracts taken from normal mammalian 
subjects, preferably human, with antibody to INSIG under conditions suitable for complex formation. 
The amount of standard complex formation may be quantitated by various methods, preferably by 
photometric means. Quantities of INSIG expressed in subject, control, and disease samples from 
biopsied tissues are compared with the standard values. Deviation between standard and subject 
values establishes the parameters for diagnosing disease"). 

Thus a person skilled in the art on June 16, 1999, who read the Yue '566 specification, 
would have routinely and readily appreciated that the SEQ ID NO: 12 polypeptide disclosed therein 
would be useful to conduct protein expression monitoring analyses using 2-D PAGE mapping or 
western blot analysis or any of the other traditional membrane-based protein expression monitoring 
techniques that were known and in common use many years prior to the filing of the Yue '566 
application. For example, a person skilled in the art in June 1999 would have routinely and readily 
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appreciated that the SEQ ID NO: 12 polypeptide would be a useful tool in conducting protein 
expression analyses, using the 2-D PAGE mapping or western analysis techniques, in furtherance of (a) 
the development of drugs for the treatment of cancer, immune disorders, neurological disorders, and 
gastrointestinal disorders, and (b) analyses of the efficacy and toxicity of such drugs. 

14. I declare further that all statements made herein of my own knowledge are true 
and that all statements made herein on information and belief are believed to be true; and further, that 
these statements were made with the knowledge that willful false statements and the like so made are 
punishable by fine or imprisonment, or both, and that willful false statements may jeopardize the validity 
of this application and any patent issuing thereon. 




L. Michael Furness, B.Sc. 



Signed at Exning, United Kingdom 
this l^ day of August, 2003 
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Ref. No. A_ ^ 
Dtubasc of m liver prouins 

A two-dimensional gel database of rat liver proteins 
useful in gene regulation and drug effects studies 

A standard two-dimensional (2-D) protein map of Fischer 344 rat liver 
(F344MST3) is presented, with a tabular listing of more than 1200 protein species. 
Sodium dodecyl sulfate (SDS) molecular mass and isoelectric point have been es- 
tablished, based on positions of numerous internal standards. This map has been 
used to connect and compare hundreds of 2-D gels of rat liver samples from a va- 
riety of studies, and forms the nucleus of an expanding database describing rat 
liver proteins and their regulation by various drugs and toxic agents. An example 
of such a study, involving regulation of cholesterol synthesis by cholesterol-lower- 
ing drugs and a high-cholesterol diet, is presented. Since the map has been ob- 
tained with a widely used and highly reproducible 2-D gel system (the lso-Dalt y 
system), it can be directly related to an expanding body of work in other lab rato- 
ries. 
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1 Introduction 

High-resolution two-dimensional electrophoresis of pro- 
teins, introduced in 1975 by OTarrell and others [1—4), has 
been used over the ensuing 16 years to examine a wide va- 
riety of biological systems, the results appearing in more 
than 5000 published papers. With the advent of computer- 
ized systems for analyzing two-dimensional (2-D) gel ima- 
ges and constructing spot databases, it is also possible to 
plan and assemble integrated bodies of information de- 
scribing the appearance and regulation of thousands of pro- 
tein gene products [5, 6]. Creating such databases involves 
amassing and organizing quantitative data from thousands 
of 2-D gels f and requires a substantial commitment in tech- 
nology and resources. 

Given the long-term effort required to develop a protein da- 
tabase, the choice of a biological system takes on consider- 
able importance. While in vitro systems are ideal for answer- 
ing many experimental questions, especially in cancer re- 
search and genetics, our experience with cell cultures and 
tissue samples suggests that some in vivo approaches could 
have major advantages. In particular, we have noticed that 
liver tissue samples from rais and mice appear to show grea- 
ter quantitative reproducibility (in terms of individual pro- 
tein expression) than replicate cell cultures. This is perhaps 
a natural result of the homeostasis maintained in a com- 
plete animal vs. the well-known variability of cell cultures, 
the latter due principally to differences in reagents (e.g.. 
fetal bovine serum ). conditions i e.:.. pH ) and genetic "evo- 
lution" of cell lines while in culture. It is also more difficult 
to generate adequate amounis of protein from cell culture 
systems (particularly with attached cells), forcing the inves-. 
tigatorto resort to radioisotope-based or silver-based stain- 
detection methods. While these methods are more sensi- 
tive (sometimes much more sensitive) than theCoomassie 
Brilliant Blue (CBB) stain typically used for protein detec- 
tion in •large" protein samples, they are generally more vari- 
able, more labor-intensive and. in the case of radiographic 
methods, may generaie highJy "noisy" images, due to the 
properties of the films used. By contrast, large protein sam- 
ples can easily be prepared from liver using urea/Nonidet 
P-40 (NP-40) s lubilization and stained with CBB, which 
has the advantage of being easily reproducible (8). Finally, 
there remains the question of the Truthfulness* of many in 
vitro systems as c mpared to their in vivo analogs; h w 
great are the changes caused by the introduction tnt a cul- 
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ture and the associated shift to strong selection for growth, 
and how do these affect experimental outcomes? Hence 
the apparent advantages of in vitro systems, in terms of ex- 
perimental manipulation, may be counterbalanced by 
other factors relating to 2-D data quality. 

There is a second important class of reasons for exploring 
the use of an in vivo biological system such as the liver. His- 
torically, there have been r»o broad approaches to the me- 
chanistic dissection of biochemical processes in intact cel- 
lular systems: genetics <a search for informative mutants) 
and the use of chemical agents (drugs and chemical toxins). 
Both approaches help us 10 understand complex systems 
by disrupting some specific functional element and show- 
ing us the result. With the development of techniques for 
genetic manipulation and cloning, the genetic approach 
can be effectively applied either in vitro or in vivo, although 
the in vitro route is usually quicker. The chemical approach 
can also be applied to either son of biological system: here, 
however, the bulk of consistently acquired information is 
in experimental animals (rats and mice). While most biolo- 
gists know a short list of compounds having specific, experi- 
mentally useful effects (e.g.. inhibitors of protein synthesis, 
ionophores, polymerase inhibitors, channel blockers, nu- 
cleotide analogs, and compounds affecting polymerization 
of cytoskeletal proteins), there is a much larger number of 
interesting chemically-induced effects, most of them char- 
acterized by lexicologists and pharmacologists in rodent 
systems. Just as a thorough genetic analysis would involve 
saturating a genome with mutations, it is possible to ima- 
gine a saturating number of drup. the analysts of whose ac- 
tions would reveal the complete biochemistry of the cell. 
While organized drug discovery efforts usually target spe- 
cific desired effects, the nature of the process, with its de- 
pendence on screening large numbers of compounds, ne- 
cessarily produces many unanticipated effects' It is there- 
fore reasonable to suppos; that the required broad range of 
compounds necessary to achieve ""biochemical saturation" 
may be forthcoming; in fact, it may already exist among the 
hundreds of thousands of compounds that failed to qualify 
as drugs. 

Among organs, the liver is an obvious choice for the study 
of chemical effects because of its well-known plasticity and 
responsiveness. The brain appears to be quite plastic (e.g. 
(7]), but it is a complicated mixture of cell types requiring 
skillful dissection for most experiments. The kidney, while 
quite responsive, also presents a potentially confounding 
mixture of cell types. The liver, by contrast, is made up of 
one predominant cell type which is easy to solubilize: the 
hepatocyte, representing more than 95% of its mass. Most 
importantly, the liver performs many homeostatic func- 
tions that require rapid modulation of gene expression. It 
appears that most chemical agents tested affect gene ex- 
pression in the liver at some dosage (N. Leigh Anderson, 
unpublished observations), an interesting contrast to our 
earlier work with lymphocytes, for example, which seem to 
be much less responsive. Such results conform to the expec- 
. tation that cells with a homeostatic, physiological role 
sh uld be more plastic than cells differentiated f r a pur- 
pose dependent on the action of a limited numb r of spe- 
cific genes. 

The liver also allows the parallels between in vitro and in 
vivo systems to be examined in detail. Significant progress 



has been made in the development of mouse, rai 
man hepatocyte culture systems. as well as in precis^r h " 
tissue slices. Using such an array of techniques, it l% 
ble to assemble a matrix of mammalian systems inc^- 1 * 
mouse and rat in vivo on one level and mouse, rat ar?-^ 
man in vitro on a second level, and to compare effen* % 
tween species and between systems. This approach au 
us to draw informed conclusions regarding the biochenv*' 
"universality" of biological responses among the numr*** 
and to offer some insight into the validity of in vnr'^l 
proaches for toxicological screening. We believe this V* 
will be necessary if in vitro alternatives are to achieve 
usage in government-mandated safety testing of dru&J« « 
sumer products and industrial and agricultural chern^V 

A number of interesting studies have been published ustr.- 
2-D mapping to examine effects in the rodent liver. A nu~ . 
ber of investigarors have made use of the technique 
screen for existing genetic variants [8-11] or induced rnuC- 
tions [12—14). mainly in the mouse. This work builds on in* 
wealth of genetic information available on the mouse ani 
its established position as a mammalian mutation-dete^. 
tion system. While some studies of chemical effects ha\* 
been undertaken in the mouse 115-17), most have used xh* 
rat [18-23). The examination of the cytochrome p-450 sys- 
tem, in particular, has been carried out almost exclusive!* 
on the rat [24. 25J. 

These considerations lead us to conclude that rodent live- 
offers the best opportunity to systematically examine ar. 
array of gene regulation systems, and ultimately to build a 
predictive model of large-scale mammalian gene control. 
The basic underlying foundation of such a project is a reli- 
able, reproducible master 2-D pattern of liver, to which on- 
going experimental results can be referred. In this paper.*? 
report such a master pattern for the acidic and neutral pro- 
teins of rat liverf pattern F344MST3). In future, this master 
will be supplemented by maps of basic proteins, and analog- 
ous maps of mouse and human liver. 



2 Materials and methods 
2.1 Sample preparation 

Liver is an ideal sample material for most biochemical stud- 
ies, including 2-D analysis. A sample is taken of approxima- 
tely 0.5 g of tissue from the apical end of the left lobe of the 
liver. Solubilization is effected as rapidly as practical; a 
delay of 5-15 min appears to cause no major alteration in 
liver protein composition if the liver pieces are Kept cold 
(e.g., on ice) in the interim. In the solubilization process, 
the liver sample is weighed; placed in a glass homogenize* 
(e.g., 15 mL Wheaton); 8 volumes of solubilizing solution 

• The solubilizing solution is composed of 2% NP-40 (Sigma).* 
(analytical grade, < c.. BDH or Bio-Rad). 0.5% dithiothreiiol IPj ' * 
Sigma) and 2 %> earner ampholytes (pH 9-1 1 DCB: these comets ij^j 
stock solution, so 2 % final concentration is achieved by making i B * 
solution 10% 9-1 1 Ampholine by volume). A large b » lcn o ° r *° j ^ 
(several hundred mL) is made and stored frozen at -80°C in ™ , V |r 
sufficient to provide enough for one day's estimated sample 
tion requirement. The solution is never allowed to become ^ 
than room temperature at any stage during preparation or lM ^ 
use. since heating of concentrated urea solutions can produce eo 
nants that covalently modify proteins producing arttfaciutl < 
shifts. Once thawed, any unused solubilizer is discarded. 
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(i.e., 4 mL per 0 J g tissue) and the mixture is ho- 
ed using first the loose- and then then the tighi-fit- 
x glass pestle. This takes approximately 5 strokes with 
4 pestle and is carried out at room temperature because 
£ would crystallize out in the cold. Once the liver sample 
thoroughly homogenized in the solubilizer. it is assumed 
it all the proteins are denatured (by the diaotropic effect 
the urea and NP-40 detergent) and the enzymes inacti- 
tcd by the high pH (-9.5). Therefore these samples may 
;kept at room temperature until they can be centrifuge d 
frozen as a group (within several hours of preparation). 
ie samples are centrifuged for 6 X 10* g min {e.g.. 500 000 
g for 12 min using a Beckman TL-100 centrifuge). The 
iDirifuge rotor is maintained at just below room tempera- 
jt (e.g.* 15-20 °C), but not too cold, so as to prevent the 
xripitation of urea. The centrifuge of choice is a Beckman 
LrlOO because of the sample tube sizes available, but any 
[tracentrifuge accepting smallish tubes will suffice. When 
i appropriate centrifuge is not available near the site of 
jmple preparation, samples can be frozen at -80 £ C and 
awed prior to centrifugation and collection of superna- 
ints.Each supernatant is carefully removed following cen- 
ifugation and aliquoted into at least 4 clean tubes for stor- 
ge.This is done by transferring al) the supernatant to one 
lean tube, mixing this gently (to assure homogeneous 
□imposition) and then dividing it into 4 aliquots. The ali- 
uots are frozen immediately at — 80°C. These multiple ali- 
uots can provide insurance against a failed run 0:2 freezer 
•reakdown. 

12Tw -dimensional electrophoresis 

Jample proteins are resolved by 2-D electrophoresis using 
£20 X 25 cm Iso-Dalt* 2-D gel system ([26-29]; pro- 
ceed by LSB and by Hoefer Scientific Instruments, San 
rrancisco) operating with 20 gels per batch. All first-dimen- 
iional isoelectric focusing (IEF) gels are prepared using the 
same single standardized batch of carrier ampholytes 
JDH 4-8A in the present case, selected by LSB's batch- 
«ting program for rat and mouse database work**). A 10 
sJTsample of solubilized liver protein is applied to each gel, 
and the gels are run for 33000 to 34500 volt-hours using a 
progressively increasing voltage protocol implemented by 
programmable high-voltage power supply. An Ange- 
lique~ computer-controlled gradient-casting system (pro- 
duced by LSB) is used to prepare second-dimensional sod- 
ium dodecyl sulfate (SDS) polyacrylamide gradient slab 
% in which the top 5 % of the gel is 1 1 %T acrylamide, and 
toe lower 95% of the gel varies linearly from 11 % to 18 %T. 

JKs system has recently been modified so as to employ a 
gmmercially available 30.8%T acrylamide/A'.A'-raethyle- 
Qfbisacrylamide prepared solution (thus avoiding the han- 
ging of the solid acrylamide monomer) and three addi- 
tional stock solutions: buffer (made from Sigma pre-set 
ftis), persulfate and - A r ,#,A\A"-ietramethylethyienedi- 
.$gine (TEMED). Each gel is identified by a computer- 
ised filter paper label polymerized into the lower left cor- 
.of the gel. First-dimensional IEF tube gels are loaded 

s material (succeeding certified batches of which are available from 
oefer Scientific instruments) has the most linear pH gradient pro- 
uced by any ampholyte tested except for the Pharmacia wide range 
which has an unacceptable tendency to bind high-molecular weight 
'die proteins, causing them to streak). 



directly (as extruded) onto the slab gels without equilibra- 
tion, and held in place by polyester fabric wedges (Wed- 
gies"', produced by LSB) to avoid the use f hot agarose. 
Second-dimensional slab gels are run overnight, in groups 
of 20, in cooled DALT tanks (10°C) with buffer circulation. 
All run. parameters, reagent source and lot information, 
and notations of deviation from expected results are ente- 
red by the technician responsible on a detailed, multi-page 
record of the experiment. 

23 Staining 

Following SDS-electrophoresis, slab gels are stained for 
protein using a colloidal Coomassie Blue G-250 procedure 
in covered plastic boxes, with 10 gels (totalling approxima- 
tely 1 L of gel) per box. This procedure (based on the work 
of Neuhoff[30,31]) involves fixation in 1.5 L of 50% etha- 
nol and 2% phosphoric acid for2h. three 30 min washes, 
each in 2 L of cold tap water, and transfer to 1.5 L of 34% 
methanol, 17 % ammonium sulfate and 2 % phosphoric acid 
for 1 h. followed by the addition of a gram of powdered Coo- 
massie Blue G-250 stain. Staining requires approximately 4 
days to reach equilibrium intensity, whereupon gels are 
transferred to cool tap water and their surfaces rinsed to re- 
move any paniculate stain prior to scanning. Gels may be 
kept for several months in water with added sodium azide. 
The water washes remove ethanol that would dissolve the 
stain (and render the system noncolloidal. with high back- 
grounds). The concentrated ammonium sulfate and meth- 
anol solution is diluted by equilibration with the water vol- 
ume of the gels to automatically achieve the correct final 
concentrations for colloidal staining. Practical advantages 
of this staining approach can be summarized as follows: (i) 
the low, fiat background makes computer evaluation of 
small spots (max OD < 0.02) possible, especially when 
using laser densitometry; (ii) up to 1500 spots can be reli- 
ably detected on many gels (e.g.. rat liver) at loadings low 
enough to preserve excellent resolution; and (iii) reprodu- 
cibility appears to be very good: at least several hundred 
spots have coefficients of reproducibility less than 15%. 
This value is at least as good as previous CBB methods, and 
significantly better than many silver stain systems. 

2.4 Positional standardization 

The carbamylated rabbit muscle creatine phosphokinase 
(CPK) standards [32] arc purchased from Pharmacia and 
BDH. Amino acid compositions, and numbers of residues 
present in proteins used for internal standardization, are 
taken from the Protein Identification Resource (PIR) se- 
quence database [33). 

2.5 Computer analysis 

Stained slab gels are digitized in red light at 134 micron re- 
solution, using either a Molecular Dynamics laser scanner 
(with pixel sampling) or an Eikonix 78/99 CCD scanner. 
Raw digitized gel images are archived on high-density DAT 
tape (or equivalent st rage media) and a greyscale video- 
print prepared from the raw digital image as hard-copy 
backup of the gel image. Gels are processed using the Kep- 
ler* software system (produced by LSB), a commercially 
available workstation-based software package built n 
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some of the principles of the earlier TYCHO system 134- 
4]]. Procedure PROC008 is used to yield a spotlisi giving 
position, shape and density information for each detected 
spot. This procedure makes use of digital filtering, mathe- 
matical morphology techniques and digital masking to re- 
move the background, and uses full 2-D least-squares opti- 
mization to refine the parameters of a 2-D Gaussian shape 
for each spot. Processing parameters and file locations are 
stored in a relational database, white various log files detail- 
ing operation of the automatic analysis software are ar- 
chived with the reduced data.The computed resolution and 
level of Gaussian convergence of each gel are inspected 
and archived for quality control purposes. 

Experiment packages are constructed using the Kepler ex- 
periment definition database to assemble groups of 2-D 
patterns corresponding to the experimental groups (e.g., 
treated and control animals). Each 2-D pattern is matched 
to the appropriate "master" 2-D pattern (pattern 
F344MST3 in the case of Fischer 344 rat liver), thereby 
providing linkage to the existing rodent protein 2-D data- 
bases. The software allows experiments containing hun- 
dreds of gels to be constructed and analyzed as a unit, with 
up to 100 gels displayed on the screen at one time for com- 
parative purposes and multiple pages to accommodate ex- 
periments of > 1000 gels. For each treatment, proteins 
showing significant quantitative differences vs. appropriate 
controls are selected using group-wise statistical parame- 
ters (e.g.. Student's t-test, Kepler* procedure STUDENT). 
Proteins satisfying various quantitative criteria (such as P< 
0.001 difference from appropriate controls) are repre- 
sented as highlighted spots onscreen or on computer-plot- 
ted protein maps and stored as spot populations (i.e., logi- 
cal vectors) in a liver protein database. Quantitative data 
(spot parameters, statistical or other computed values) are 
stored as real-valued vectors in the database. Analysis of co- 
regulation is performed using a Pierson product-moment 
correlation (Kepler procedure CORREL) to determine 
whether groups of proteins are coordinately regulated by 
any of the treatments. Such groups can be presented graphi- 
cally on a protein roap,and reported together with the statis- 
tical criteria used to assess the level of coregulation. Multi- 
variate statistical analysis (e.g., principal components* ana- 
lysis) is performed on data exported to SAS (S AS Institute). 

2.6 Graphical dau ontpoi 

Graphical results are prepared in GKS and translated 
within Kepler* into output for any of a variety of devices. 
Linedrawing output is typically prepared as Postscript and 
printed on an Apple LaserWriter. Detailed maps presented 
here have been generated using an ultra-high-resolution 
Postscript-compatible Linotronic output device. Greyscale 
graphics are reproduced from the workstation screen using 
a Seikosha videoprimer. Patterns are shown in the standard 
orientation, with high molecular mass at the top and acidic 
proteins to the left. 

2.7 Experiment LSBC04 

In the study described here 12-week-old Charles River 
male F344 rats were used. Diets were prepar d at LSB, 
based on a Purina 5755M Basal Purified Diet. Lovastatin 
and cholestyramine were obtained as prescription pharma- 



ceuticals^ground and mixed with the diet at concur, 
of 0.075% and 1 %, respectively. The high *J5S** 
was Purina 5801M-A (5% cholesterol plus 1 % sodium 
late in the control diet). Animal work was carried out b • v 
crobiological Associates (Bethesda, MD). Animals were 
climatized for one week on the control diet, fed test ore 
trol diets for one week, and sacrificed on dav 8. Aver a ° n " 
daily doses of lovastatin and cholestyramine in appropruf 
groups were 37 mg/kg/day and 5 g/kg/day, respective^ 
based on the weight of the food consumed. Liver sarnol 
were collected and prepared for 2-D electrophoresis accord 
ing to the standard liver protocol (homogenization in * 
volumes of 9 m urea, 2% NP-40. 0.5% dithiothreitoi 2* 
LKB pH 9-11 carrier ampholytes, followed bv cemrifu*a 
tion for 30 min at 80000 X g). Kidney, brain'and plasma 
samples were frozen. Gels were run as described abov* 
and the dau was analyzed using the Kepler* svstern Gels 
were scaled, to remove the effect of differences in protein 
loading, by setting the summed abundances of a large num. 
ber of matched spots equal for each gel (linear scaling) 



3 Results and discussion 

3.1 The rat liver protein 2-D map 

F344MST3 is a standard 2-D pattern of rat liver proteins, 
based on the Fischer 344 strain. This pattern was initialed 
from a single 2-D gel and extensively edited in an experi- 
ment comparing it to a range of protein loads, so as to in- 
clude both small spots and well-resolved representations of 
high-abundance spots. More than 700 rat liver 2-D patterns 
have been matched to F344MST3 in a series of drug effects 
and protein characterization experiments, and numerous 
new spots (induced by specific drugs, for instance) have 
been added as a result. A modified version including addi- 
tional spots present in the Sprague-Dawley outbred rat has 
also been developed (data not shown). Figure 1 shows a 
greyscale representation and Fig. 2 a schematic plot of the 
master pattern. More than 1200 spots are included, most of 
which are visible on typical gels loaded with 10 uLof solubi- 
lized liver protein prepared by the standard method and 
stained with colloidal Coomassie Blue. Master spot num- 
bers (MSN's) have been assigned to all proteins, and ap- 
pear in the following figures, each showing one quadrant of 
the pattern. Figure 3 shows the upper left (acidic, high 
molecular mass) quadrant. Fig. 4 the upper right (basic 
high molecular mass) quadrant, Fig. 5 the lower left (acidic, 
low molecular mass) quadrant, and Fig. 6 the lower right 
(basic, low molecular mass) quadrant. The quadrants over- 
lap as an aid to moving between them. The gel position (ifl 
100 micron units), isoelectric point (relative to the CPK in- 
ternal p/standards) and SDS molecular mass (from the cali- 
bration curve m Fig. 8) are listed for each spot (Table 1). Be- 
cause of the precision of the CPK-p/ values, these parame- 
ters can be used to relate spot locations between gel $>**' 
terns more reliably than using pi measurements expressed 
as pH . A major objective of current studies is the identifica- 
tion of all major spots corresponding to known liver pf°* 
teins, as well as rigorous definitions of subcellular org* 
nelle contents. Of particular interest to us is the parallel de- 
velopment of identifications in the rat and mouse li vC 
maps, allowing detailed comparisons of gene expression & 
fects in the two systems. The results of these studies w *"~7 
presented systematically in a later editi n of this database. 



rlttl./J.W>-#JO 



Daubuc of rai liver proictai 



911 



Jwe include here a useful series of 22 orienting identifi- 
tJbnsas an aid to other users of the rat liverpattern (Table 



^Carbamylated charge standards, computed pfs and 
' molecular mass standardization 

febave previously shown that the use of a system of close- 
spaced internal p/ markers (made by carbamylating a 
ssic protein) offers an accurate and workable solution to 
ic problem of assigning positions in the p/dimension [32]. 
be same system, based on 36 protein species made by car- 
amylating rabbit muscle CPK. has been used here to as- 
; gn pfs to most rat liver acidic and neutral proteins. The 
tasdards were coelectropboresed with total liver proteins, 
nd the standard spots added to a special version of the 
laster pattern F344MST3. The gel A-coordinates of all 
iver protein spots lying within the CPK charge train were 
ben transformed into CPK p/ positions by interpolation 
>etween the positions of immediately adjacent standards 

Table 1) using a Kepler* vector procedure. 

*• 

t has proven possible to compute fairly accurate p/ values 
or many proteins from the amino acid composition (42]. 
Nt have attempted here to test a further elaboration of this 
lpproach. in which we computed pfs for the CPK standards 
iemselves, based on our knowledge of the rabbit muscle 
CPK sequence and the fact that adjacent members of the 
iarge train typically differ by blockage of one additional ly- 
sine residue (Table 3). We compared these values to similar 
computed pf s for an additional set of carbamylated stand- 
Sis made from human hemoglobin beta chains and a se- 
ries of rat liver and human plasma proteins of known posi- 
tion and sequence (Fig. T.Table 4). The result demonstrates 
good concordance between these systems. Two proteins 
show significant deviations: liver fatty-acid binding protein 
(FABP; #1 in Table 4) and protein disuiphide isomerase 
gQO in the table). The FABP spot present on F344MST3 
gay represent a charge-modified version of a more basic 
parent spot closer to the expected p/, not resolved in the 
EF/SDS gel. Of particular importance is the fact that, by 
comparing computed pfs of sequenced but unlocated pro- 
fins with the CPK p/*s, we can assign a probable gel loca- 
jjfbn without making any assumptions regarding the actual 
gel pH gradient. This offers a useful shortcut, given the va- 
garies of pH measurement on small diameter IEF gels. We 
five used this approach to compute the CPK pfs of all rat 
Sd mouse proteins in the PIRsequence database, as an aid 
' Dtein identification (data not shown). 

[order to standardize SDS molecular weight (SDS-MW), 
shave used a standard curve fitted to a series of identified 
Jroteins (Fig. 8). Rather than using molecular mass perse, 
*e have elected to use the number of amino acids in the 
Polypeptide chain, as perhaps a better indication of the 
jgth of the SDS-coated rod that is sieved by the second 
tension slab. The resulting values were multiplied by 
(the weighted average mass of amino acids in se- 
I proteins) to give predicted molecular masses. Be- 
se we use gradient slabs, we have not constrained the fit- 
^curve to conform to any predetermined model; rather 
juried many equations and selected the best using the 
fgram "Tablecurve" on a PC. The equation chosen was>> 
■j* bx + c/x 2 , where y is the number of residues, x is th e gel 



Y coordinate, a is 5 1 1 .83, b is -0 .273 1 and c is 33 1 83 80 1 . The 
resulting fit appears to be fairly good vera broad range of 
molecular mass. 

3 J An example of rat liver gene regulation: Cholesterol 
metabolism 

Experiment LSBC04 was designed as a small-scale test of 
the regulation of cholesterol metabolism in vivo by three 
agents included in the diet: lovastatin (Mevacor 1 , an inhibi- 
tor of HMG-CoA reductase); cholestyramine (a bile acid 
sequestrant that has the effect of removing cholesterol 
from the gut-liver recirculation); and cholesterol itself. The 
first two agents should lower available cholesterol and the 
third should raise it, allowing manipulation of relevant 
gene expression control systems in both directions. Such 
an experiment offers an interesting test of the 2-D mapping 
system since most of the pathway enzymes are present in 
low abundance, many are membrane-bound and difficult 
to solubilize, and the pathway itself is complex. Approxima- 
tely 1000 proteins were separated and detected in liver ho- 
mogenates. Twenty-one proteins were found to be affected 
by at least one treatment, and these could be divided into 
several coregulated groups. 

33.1 MSN 413 (putative cytosolic HMG-CoA synthase) 
and sets of spots regulated coordinate)} or inversely 

One group of spots (including a spot assigned to the cyto- 
solic HMG-CoA synthase, MSN 413) showed the expected 
increase in abundance with lovastatin or cholestyramine, 
the synergistic further increase with lovastatin and choles- 
tyramine, and a dramatic decrease with the high cholesterol 
diet. Spot number 413 is the most strongly regulated pro- 
tein in the present experiment, showing a 5- to 10-fold in- 
duction after a 1 week treatment with 0.075% lovastatin and 
1% cholestyramine in the diet (Figs. 9 and 10). Its expres- 
sion follows precisely the expectation for an enzyme whose 
abundance is controlled by the cholesterol level; it is pro- 
gressively increased from the control levels by cholestyra- 
mine, lovastatin and lovastatin plus cholestyramine, and it 
sinks below the threshold of detection in animals fed the 
high cholesterol diet. This spot has been tentatively identi- 
fied as the cytosolic HMG-CoA synthase, based on a reac- 
tion with an antiserum to that protein provided by Dr. Mi- 
chael Greenspan at Merck Sharp &. Dohme Research Labo- 
ratories. This enzyme lies immediately before HMG-CoA 
reductase in the liver cholesterol biosynthesis pathway, and 
is known to be co-regulated with it. Spot 413 has an SDS 
molecular weight of about 54 000 and a CPK p/ of - 11 .4, in 
reasonably close agreement with a molecular weight of 
57300 and a CPK p/ of -15.7 computed from the known se- 
quence of the hamster enzyme [43]. 

Using a classical product-moment correlation test (Kepler 
procedure CORREL), a series of five additional spots was 
found to be coregulated with 413. The level of correlation 
was exceedingly high (> 95%). Two of these, 1250 and 933, 
are at similar molecular weights and approximately one 
charge more acidic than 413 (Fig. 9), indicating that they 
may be covalently modified forms of the 413 polypeptide. 
This suspicion is strengthened by the observation that both 
spots are also stained by the antibody to cytosolic HMG- 
CoA synthase. The remaining three correlated spots appear 



^JS'Sln additionaJ ni *it6 pair (1253 and 1001) of 
Because these iwo presumed proteins are present at su£ 

fv„. ?r , " C ° A ?' nlhasc ,s reported "> consist of onlvone 
type of pol>-pept.de. they are likely to represent other' verv 

was selected based on a regulators pattern close to th* in 
verse of that for spot 413 (MSN's K 79.?78 i m, 20 47 
data not shown). For these proteins, the lowest level of ex- 
pression occurs with exposure to lovastatin plus cholestyra- 
mme and the highest level upon exposure to the Wgh cho- 
EZt i C H SPOU 182 BBd 79 " Wghly correlated and li° 
!?1 k J r ,Pan " rte Mme »oI«ular weight; they 
SZL Sr„h b ? 1 B0f0nBS of a Protein. The othe four 
spots probably represent additional enzymes or subunits 

3.3.2 MSN 235 and coregulated spots 

A third group of five spots, mainly comprised of mitochon- 
drial proteins including putative mitochondrial HMG- 
K aSe f p0tS ' Showed a mooe$t induction bylovasta- 
?~ £? ' *? U i htUe ° r D0 efFcct with any of the other treat- 
tT™™i^ C i Ud, ?f l ^ C0rnDinali0D of iovasutin and choles- 
fr ^ i ' 8 L 2)1 ^ S result i$ »»*««»« because lovasta- 
chT, ^ CC,ed i° ^ 0n,ylhe regulai,on of enzymes of 
iSi -rl Synth " ,s - wh,ch is entirely extra-muochon- 

packed tnad at approximately 30 kDa. and are likely to re- 
present isoforms of one protein. All three spots are stained 
by an antibody to the mitochondrial form* *HMG E5 
Si? " e ° bta,nedfromDr - G ^nspan.Subcellularfract,o- 
. J ndlCates a mitochondrial location. The other two 
spots (633 at about 38 kDa and 724 at about 69 kDa) are 
each present at lower abundance than the members of the 

3.3 J An example or an anti-synergistic elTect 

,? 0t r 3 . 6 J? ShOWS Stron * eduction by lovastatin 
two- to threefold), and about half as much induction wi th 
fcvastann plus cholestyramine, but without sharing^ aS 
mal-animal heterogeneity pattern of the 235-set (Fig 13 
«t ,S e«S "r^ 50 mitocho ndrial. and represents the clear- 
chou^^ ^ anU ' SynergiSlic Cfrect of lovastatin and 
?r?,!c ^ am ne ' 71,6 eX,StCnCC of such an eff «t demon- 
s.Ve to \£S "ST 1 ™ imd do not act exclu- 

sively through the same regulatory pathway. 

3.3.4 Complexity of the cholesterol synthesis pathway 

f^' lnese r re$u,ts s "8*est that treatment with Io- 
vasutin alone can affect both cytosolic and mitochondrial 

nfK?r ayS ^ US,n t^ MG - CoA - whi, e cholestyramine, on the 
n2 e u ? and *? UhCra l 0ne orin combination with lomuUn 
bmSe JT***™ °" the putative «W»Bc pathway 
way An ex D u„ 9 l eCt r° D ? I2 Bth ' e mitochondrial path- 
way. An explanation for this difference may lie in lovasta. 

o m ^f C „l 0 f h leVelS of HMG *CoA and related precureor 
Se rnSnn^" afC b »*«n the cyiosol and 

onlv m h?™ d r n ' v hereas cnn'w'yramine should affect 
«rIn?K ^ 0S ^. ICpathways directlycontrolled bycholester- 
ol and bile acid levels. It remains to be explained why some 



proteins of the putative mitochondrial oath* 

muchmorevariableintheirexpressionma P r^ aV are * 
amination of all the coregulated ^IttZT^ 
tiutive statistical techniques can extract a *« hj f ,,,te 
estingmformationfrom large sets of reproducible °f m ' er - 
abundance of spots in the 4 13 coregulaiion gro UD r^^ 
Pie. shows an amazing level of concordance*? Sir £ ^ 
expression among the five individuals of the loS * 
cholestyramine treatment group. This effect is nof ? 
differences m total protein loading.since thevnave ,?"' 10 
been removed by scaling, and since proteins u? th !, alreadv 
erem regulation patterns can be dLonstrateS I J?.* 
13).Such effects raise the possibility that manvge ne c t F ' r 
lation sets may be revealed through the « U d "r 
cjemly ,arge population of control an,ma ( ? f 0f u t ^- 
any expenmenul manipulation). This approach U: 
natural biological variation in protein exoreS ' ns 
drug effects, offers an importam £^ foSKK? * 
uon of a large library of control an.mal patterns UC ' 



4 Conclusions 

2f**V*f of ^widespread use of rat liver in both basic bio- 
chemistry and ,n toxicology, there is a long-term need fo t 
comprehens,ve database of liver proteins. The rat "verm « 
ier pattern presented here has proven to be an accurate^- 

and the n g lV° "r"' AS the DUmb "of protems idenE 
and the number of compounds tested for gene expression 
effects grows, we expect this database to contribute vX 
able insights into gene regulation. Its practical utilitvin sev- 
eral areas of mechanistic toxicology is alreadv being de- 
monstrated. " 

Received September 11. 1991 
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Figure S. Plot of number of amino acids versus gel )'-pos;nor..*»:.-i f s; 
curve used to predict molecular mass of unidentified proteins 
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-4 Figure 7. (a) Plot of computed isoelectric point versus ge! A'-posmon fo* 
two sets of carbamylaied sundard proteins (rabbit muscle CPK Mane 
human hemoglobin & chain, filled diamonds) and several other proteim 
(shaded squares), (b) The identities of the various proteins representee 
by the squares are indicated by the numbers in corresponding positions 
on (a); these refer to Table «. 
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F/fiw 9. Montage showing effects in 
region of M S N :4 13 . The montage sho»^ • 
small window into one portion of the J * D 
pattern, one row of windows for each esr«" 
nmental group, and one panel for ei* F* 1 
in the expenment.The lefi-most ps«o» 
in each row is a group-specific copy© 
master pattern followed by the P J| |^ 
for the five individual rats in the I*** 
The highlighted protein spots (f>»« 
les) are spot 4 13 (on the nghi of each 
el; idemifted as cytosolic HMG-CoA^ 
thase) and two modified forms of u U^T 
and 933). From the top, the ro** 
menul groups) are: high ch 01 " 16 ' 0 ! ^ 
trols. cholestyramine, lovastatm. •*» 
sutin plus cholestyramine. 
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Figure JO. Bargraph showing the quantita- 
tive effects of various treatments on the 
abundance of MSN:4I3 (cyiosohc HMG* 
CoA synthase) in the gels of Fig. 9. 




Figure J J. Bargraphs of a series of six core- 
guiated spots including MSN:4l3. In the 
bargraphs, the abundances of the appro- 
priate spot (muter spot number shown at 
the top of the panel) in each animaJ are 
shown. The five five-animal groups are in 
the order (left to right): high cholesterol, 
controls, cholestyramine, lovastatin, and 
lovastatin plus cholestyramine. Each bar 
within a group represents one experimen- 
tal animal liver (one 2-D gel). Note the cor- 
related expression of the 6 spots, espe- 
cially in the two far right (most strongly in- 



duced) groups. 
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fify/r /.\ Data on a second coreiuiaicr 
group of spots, presenitc as ir. Ftc. II Th. 
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does not. 
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Ftgure 13. Diu on spot MSN:J67. presented is in Fig. 11 Th,s 
shows unambiguously the anti-synergistic effect of iovasiaun * n t^^ ftr 
tyrtmine (fifth group) ts compared to lovisutin (fourth group) 
ponse contrasts strongly with the regulation pattern seen in "J- 
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*Uster uble of proteins in the rat liver daubasc. showing spot master number, gel position (* and y), isoelectric point relative to CPK sundards. and 
Predicted molecular mass (from the sundard curve of Fig. 8). 
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eJ. Computed pfs of fwo icti of ortomjUxed protein suadartx: Ribbtt muscle CPK and humas 
hemoglobin (Hb) 



PIR «ASP #GLU #H1S flYS #ARG NH2- Calc Real 
Protein Name Name 3.9 4.1 6.0 10.8 12.5 7.0 oi CPK 
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Table 4. Computed pf$ of some tao«n proteins related to measured CPK pf s 



tt*n*>0h> 



Protein Name 



PIR *ASP #GLU #H!S #LVS#ARGCale~ 
Name 3.9 4.1 SJO 10* 12£ D , 



fit* 



Creatine phospho kinase (CPK). rabbit muscle 
Fatty abd-binding protein, rai hepatic 
b2^microgtobut>n, human 
Carbamoyl-phosortate synthase, rat 
Proalbumm ( serum aiDumtn precursor), rat 
Serum albumin, rac 

Superoxid dismuiase (Cu-2n. SOD), rat 

Phospholipase C. phophoinosrbae-specific {?). rat 

Albumin, human 

Apo A-l lipoprotein, rat 

proApo A-i lipoprotein, human 

NADPH cytochrome P-450 reductase, rat 

Retinol binding protein, human 

Actin beta, rat 

Actin gamma, ra: 

Apo A-l lipoprotein, human 

Apo A-tV lipoprotein, human 

Tubulin alpha, ra: 

FlATPase beta, bovine 

Tubulin beta, pig 

Protein disuiphide isomerase (PDI). rat hepatic 

Cytochrome b5. rat 

Aoo C-li lipoprotein, human 

Amino aa£ pi assumea in calulation: 
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Ad updated two-dimensional gel database of rat liver 
proteins useful in gene regulation and drug effect 
studies 

We have improved upon the reference two-dimensional (2-D) electrophorelic 
map of rat liver proteins originally published in 1991 (N. L. Anderson et aL. 
Electrophoresis 1991, 12, 907-930). A total of 53 proteins (102 spots) are now 
identified, many by microsequencing. In most cases, spots cut from wet, Coo- 
massie Blue stained "2-D gels were submitted to internal tryptic digestion (2), 
and individual peptides, separated by high-performance liquid chromatography 
(HPLC). were sequenced using a Perkin-Elmer 477A sequenator. Additional 
spots were identified using specific antibodies. 



Figure 1 shows the current annotated 2-D map of F344 
rat liver, analyzed using the lso-DALT system (20 X 25 
cm gels) and BDH 4—8 carrier ampholytes. Both the 
map itself and the master spot number system remain 
the same as shown in the original publication. Table 1 
lists the important features of each identification shown, 
including the gel position, p/. and Af, for the most 
abundant or most basic form of each protein. Using this 
extended base of idemified spots, a series of four 
improved calibration functions has been derived for the 
p/ and SDS-A/ r axes (the first two of which are shown in 
Fig. 2A and B). Both forward and reverse functions are 
derived, so that one can compute the physical properties 
of a spot with a given gel location, or inversely compute 
the gel position expected for a protein having given 
physical properties: 

^ R AT LIVER = /m— RaT LIVER * ^*^r5EOVE^CE-DER!VED^ (U 
^RAT LIVER ~ ^pHRAT LIVER X (PAeoIENCE-DERIVEd) 0) 
MfGEt-DERIVED s= ftxx\jyt% y-M r OVatuvu) (3) 



p/c 



GEL-OERIVED 



/RATUVTR X-»I C^RATLTVEI 



(4) 



A spreadsheet program (in Microsoft Excel) was devel- 
oped to facilitate flexible computation of pfs from 
amino acid sequence data, and the results were entered 
into a relational database (Microsoft Access). A table of 
spot positions and sequence-derived pi's and A/ r 's was 
fitted with a large series of analytic equations using 
Tablecurve (Jandel Scientific), and the four conversion 
Eqs. (1>— (4), relating computed p/ and gel X coordinate, 
or computed molecular weight and gel Y coordinate, 
were selected, based on criteria of simplicity, goodness 
of fit and favorable asymptotic behavior. Table 2 lists the 
equations and coefficients. Application of Eqs. (3) and 
(4) to a spot's X and Y coordinates, given in [1], produce 
improved M t estimates, and allow computation of p/ 

Correspondence: Dr. Leigh Anderson, Large Scale Biology Corpora- 
lion. 9620 Medical Center Drive. Rockville. MD 20850-3338 USA (Tel: 
+301-424-5989; Fax: +301-762-4892; email: leigh lsbc.com) 

Keywords: Two-dimensional polyaerylamide gel electrophoresis / Liver 
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directly in pH units, instead of in terms of positions rela- 
tive to creatine phosphokinase (CPK) charge standards. 
The inverse Eqs. (1) and (2) were used to compute the 
gel positions of a series of p/ and M x tick marks. These 
tick marks were plotted with SigmaPlot (Jandel), 
together with fiducial marks locating several prominent 
spots, and the resulting graphic was aligned over the syn- 
thetic gel image (computed by Kepler from the master 
gel pattern) using Freelance (Lotus Development). Maps 
were printed as Postscript output from Freelance, either 
in black and white (as shown here) or in color, where 
label color indicates subcellular location (available from 
the first author upon request). We have also used the rat 
liver 2-D pattern as presented here to calibrate the pat- 
terns of other samples. Using mixtures of rat liver and 
mouse liver samples, for example, we made composite 
2-D patterns that allow use of the rat pattern to standar- 
dize both axes of the mouse pattern. This was accompli- 
shed by deriving transformations relating the fat and 
mouse X, and separately the rat and mouse Y y axes 
(Table 2, lower half; Fig. 2C and D) based on a series of 
spots that coelectrophorese in these closely related spe- 
cies. These functions were then applied to derive equa- 
tions relating the mouse liver X and Y to p/ and SDS-M, 
(Eqs. 5 and 6 below). The resulting standardized 2-D pat- 
tern for B6C3F1 mouse liver is shown in Fig. 3. 

MfMOL'SE LIVER ~ /RAT LIVER Y— Mr OmOUSE LIVER Y-RAT LIVER Y 

(^MOUSE LIVER)) (5) 

P^MOUSEUVER * AaTLIVER X-pI (/mOUSE LIVER X— RAT LIVE R X 

C^MOUSE LIVER)) (6) 

A slightly more complex approach can be used to stand- 
ardize samples that have few or no spots co-electropho- 
resing with rat liver proteins. In this case, a 2-D gel is 
prepared with a mixture of the two samples, and four 
functions (forward and backward, each for X and Y) are 
derived relating each sample's own master pattern to the 
composite. The required functions are then applied in a 
nested fashion to yield the desired result (using rat 
plasma as an example): 



rRATPLASMA 



/RATUVER Y-Mr C/raT PLASMA ♦ LIVER Y-RAT LIVER V 
(/RAT PLASMA Y -RAT PLASMA *UVER Y ( *RAT PLASM a))) 

(7) 
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/Vft/re /. Master 2-D ge) pattern of Fischer 344 rat liver proteins, annotated with 53 protein identifications and computed p/ and M r axes. 
Tentative identifications are in italic type. 



Table 1. Proteins identified in the 2-D pattern of F344 rat liver 



MSN** Protein ID 6 ' Protein name Identification comments Gel X° Experimental Gel )*> Experimental 



126 


HADO-HUMAN" 


3-HA-3.4-DO: 3-hydroxy- 
anthraailate-3,4-dioxy- 
genase 


Internal sequence 


871.95 


5.36 


921.35 


30 207 


137. 159, 288, DIDH.RAT 


3HDD: 3-hvdroxysteroid 


Ab (T.M. Penning) and pure protein 


1857.52 


6.51 


822.52 


34 406 


258 




dihydrodiol reductase 












173 


MUP.RAT 


a : u globulin 


Presence in liver microsome lumen, 
abundance in kidney, pi, M, 


9J9.16 


5.43 


1313.81 


19 549 


38 


ACTB^HUMAN 


Actin 3 


Analogy with other mammalian patterns 
(t.g. human) through ^electrophoresis 


763.40 


5.19 


693.M 


4) 586 


68 


ACTG^HUMAN 


Actin 7 


Analogy with other mammalian patterns 
(e.g. human) through coelecirophoresis 


779.42 


5.21 


692.26 


41 677 


693 


AFAR.RAT 


Aflatoxtn Bl aldehyde 
reductase 


Internal sequence 


1993-32 


6.72 


818.60 


34 593 


28, 21, 33 


ALBU^RAT 


Albumin 


Coelecirophoresis with principal plasma 
protein 


1262.81 


5.86 


445.64 


66 354 


43 


DR^M.RAT 


Aldehyde dehydrogenase 


A'-Terminal sequence and AAA 


1317.72 


591 


589.03 


49 602 


96 


ARC 1. RAT 


Arginase 


Internal sequence 


1730.72 


6J4 


756.02 


37 819 


117 


SUAR^RAT 


Arylsulfotransferase 


Internal sequence 


1547.96 


6.14 


849.08 


33 186 


1163, 1161, 


CR78.RAT 


B1P (GRP-78) 


Ab (P. Wiumann) 


665 J3 


5.01 


397.39 


74 564 


1162,20 
















185 


CAH3.RAT 


CA-1U 


Uncertain; by comparison with mouse 


1996.60 


6.72 


1017.02 


26 887 


123 


CALM.HUMAN 


Calmodulin 


Analogy with human cellular patterns 
through coelecirophoresis 


23.05 


4.03 


1433.25 


17 419 


3, 201, 48, 39, CRTCJUT 


CaJreticulin 


Ab (Lance Pohl) 


310.59 


4J4 


433.80 


68 206 



22, 24 




Table 1. continued 



MSN 8 ' 


Protein IDb) 


Protein Dine 


Identification comments 


Gel JT' 


Experimental 


Gel r> 


Experimental 


1184, 1186. 


CPSM.RAT 


Caroamyl phosphate 


2-D of pure protein; comfirmed by 


145336 


6.05 


181.64 


160 640 


114, 174, 218 




synthase 


AMertninaJ sequence and AAA 










5, 167, 157 
















54. 61 


CATA.RAT 


Catalase 


Internal sequence 


2000.81 


6.73 


499.64 


58 968 


136 


COX2.RAT 


COX-II 


Ah (J. W. Taanman). confirmed by 


452.57 


4.61 


1062.67 


25 504 








internal sequence 










87 


CYB5.RAT 


Cytochrome B5 


2-D of pure protein; Ab; confirmed 


515.68 


4.73 


1370.55 


18 493 








by AAA 










41 


CK-RAT' 


Cytokeratin 


Location in cytoskeletal fraction 


1165.12 


5.75 


569.09 


51 448 


29 


CK-RAT** 


Cytokeratin 


Location in cytoskeletal fraction 


743.11 


5.15 


605.23 


48 187 


5,11 


ENPL-RAT*' 


Endoplasmic 


Ab (F. Witanann) 


567.73 


4.83 


26337 


112 194 


60 


ENOA.RAT 


Enolase A 


Internal sequence and AAA 


1399.78 


6.00 


62334 


46 674 


27 


ER60.RAT 


ER-60 


^-Terminal sequence (It M. Van Frank) 


11 84 JO 


5.77 


523.51 


56.169 


17 


ATPBJUT 


Fl ATPasc ft 


^-Terminal sequence and AAA 


629.06 


4.95 


588.83 


49 620 


196 


ATP7.RAT 


Fl ATPase 6 


Internal sequence 


1227.24 


5.82 


1184.65 


22 310 


79 


F16PJUT 


Fructose- 1.6-bis-pbospbause Uncertain; by comparison with ID in 


924.54 


5.44 


737.77 


38 858 








wiiTuoD mho nagcr ijdl *.j i . u iw* ij i^j } 










62.78 


DHE3.RAT 


Ghuamate dehydrogenase 


^-Terminal sequence and internal sequence 


1887J9 


6.55 


566.92 


51 655 


125 


HAST-RAT" 


HaST-1: N-bydroxyaryl- 


Internal sequence 


1297.94 


5.89 


86135 


32 638 






amine sulfotransferase 












307 


HOI. RAT 


Heme oxygenase 1 


Uncertain; available data from internal 


1219.39 


5.81 


915.71 


30 423 








sequence 










413, 1250, 


HMCS.RAT 


HMG CoA synthase, 


Ab (J. Germershausen) 


1033.48 


5.59 


538.13 


54 571 


933 




cytosolic 












133, 144, 235 


HMCSJUT 


HMG CoA synthase. 


Ah (J. Germershausen), ^-terminal 


666.40 


5.02 


1019.42 


26 811 






mitochondrial (frag) 


sequence (Steiner/Lottspeich) 










8. 23. 1307 


HS7C.RAT 


HSC-70 


Positional homology (with human, etc.) 


811.87 


5.27 


425.76 


69 521 


15, 25, 110 






through coeieciropboresis 










P60.RAT 


HSP-60 


Ab (F. Wiuman); confirmed by TV-terminal 


845.09 


5J2 


520.03 


56 561 








sequence and AAA 










971 


HS70-RAr ) 


HSP-70 


Ah (F. Wurman) 


976. 11 


5.51 


437.14 


67 674 






HSP-90 


Ah (F. Wurman) 


659.86 


5.00 


329 


90 107 


256 


INGI-HUMAN 


Interferon-? induced 


Internal sequence 


993.85 


534 


1006.04 


27 237 




LAMB-RAT* 


protein 












415, 734 


Lamia B 


Positional homology with human through 


737.10 


5.14 


425.19 


69 615 








coelectrophoresis, nuclear location 










■A 


1 A WO O i*P«) 

LAMK-KAI 


T-aminin receptor" 


Internal sequence 


534.02 


4.77 


697.62 


41 327 


227 


FABL.RAT 


L-FABP (liver fatty acid 


Ab (N. M. Bass) 


1586.09 


6.18 


1483.43 


16 £22 






binding protein) 












134 


MDHC.MOUS 
E 


Malate dehydrogenase 


Internal sequence 


1270.85 


5.86 


861.96 


32 620 


18, 35. 226 


GR75-RAT* ) 


MiiconJ; grpTS 


Positional homology with human through 


905.67 


5.41 


413.67 


71 589 








coelectrophoresis 










175, 251 


NCPRJUT 


NADPH P450 reductase 


2-D of pure protein 


824.69 


5.29 


393.21 


75 366 


1168, 1170, 
1171 


PDLRAT 


PDI: Protein disulfide 


AT-TerminaJ sequence (R. M. van Frank), Ab 


564.30 


4.83 


528.47 


55 618 


47, 93 


ALBU.RAT 


isomerase 
Pro- Albumin 


Microsomal lumen location, p/, AS, relative 


1391.03 


5.99 


446.68 


66 195 








to albumin 










236 


APA1.RAT 


Pro-APO A-I lipoprotein 


Coelectrophoresis with plasma protein 


920.41 


5.43 


1137.51 


23 467 


320 


IPKLBOVIN 


Protein kinase C inhibitor 1 


Internal sequence; homology with bovine 


1480.01 


6.08 


1458.81 


17 007 








protein 










152 


PNPH.MOUSE 


Purine nucleoside 


Internal sequence 


1507.19 


6.10 


911.16 


30 599 




PYVC-RAT* 


pbospborylase 












1179. 1180, 


Pyruvate carboxylase 


Tentative; 2-D of pure protein (J. G. 


1485.10 


6.08 


22332 


131 589 


1181, 1182, 






Henslee, JBC 1979); reported in Btochim. 










1183 






Biophys. Acta 1022, 115-125- 










55, 103 


SM30.RAT 


SMP-30: Senescence 


Interna] sequence 


721.71 


5.11 


830.10 


34 051 






marker protein-30 












135 


SODC.RAT 


Superoxide dismutase 


AAA; comfirmed by internal sequence 


1161.24 


5.74 


1388.68 


18 173 








(R. M. Van Frank) 










172 


TPM-RAT" 


Tm: tropomyosin 


Location in cytoskeleion, 2-D position 


476.24 


4.66 


957.86 


28 865 








relative to human, Ab 










277, 56 


TBALRAT 


Tubulin a 


Positional homology with human through 


688.22 


5.06 


537.67 


54 620 








coelectrophoresis, cytoskeletal location 










50, 1225 


TBB1JUT 


tubulin ft 


Positional homology with human through 


621.29 


4.93 


535.48 


54 855 








coelectrophoresis* cytoskeletal location 










1224 


VIMEJtAX 


Vimenttn 


Positonal homology with human through 


673.00 


5.03 


53930 


54 426 








coelectrophoresis, cytoskeletal location 











1980 




<r ml 
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Table 1. c 


oniinued 










MSN" 


PreimlDb) 


Protein name 


Identification comments 


Gel JT 


Experimental Gel J"' Experimental 


U3 
a04 


Unknown 
BBPL.RAT 


?: not in sequence 

dxuhues 
23 kDa mombine-bmdinj 

protein 


latenuJ sequence 
IntenuJ sequence 


1191J8 

773J1 


5.78 610.42 42 469 
5 JO 1182.41 22 363 



Mister spot number (MSN) from |l) 
! SwissPROT identifier 

Coordinates of the most basic or most abundant assigned spot on the F344 master gel pattern 
pi and M, of the most basic or most abundant assigned spot, derived from the calibration functions included here 
SwissPROT style proposed identifier 
;breviations: AAA. amino add analysis; Ab, antibody 



Sit 2: Equations and coefficients 



action 



Equation (0 



.£1 gel Y - ftcompmec* SL) v • o * frexpf-x/c) 0.988181021 178.74803 1967.7892 32363.958 

X gel X - ^computed pf) ) • • a * bx - a/lax - dlx + e/x 1 - 5 0.99247216 -8685665.5 -904497.94 3856926.1 

.stputed Af r - flrat ge! .v - a + bxe 0.9960177 -*464J809 19095881 -0.9086255 

f.aputed p/ « flrax gel JT) jr « o ♦ *x * cx 3 + rfx 3 Inx * ex^ 0.99176499 4.044686 -0.00114238 0.0000323 



18276844 -27154534 



-0.00000455 0.00000000176 



use gel Y » Aw gel Y) 

rose gel X » Rrai get X) 
: eel f « flmouse gel 7) 
.. gel X ■ ftmouse gel Jf) 



y m e + bx+ cr 1-5 + <to* J Inx + 

ex/lnx 0.99951069 11861.44 678.91666 -0.78964914 1567.5639 -6953.9592 

v-o+ox'lnjc + cx^ + dx 3 0.99926349 58.935923 0.00091353 -0.000213688 0.00000159 

>«4» + ftx'lnjr+C3r 2J + dx J 0.99950032 69.740526 0.00050772 -0.000130392 0.00000116 

v - a 4 bx tcr 3 lax * dx* J + cx 3 0.9992832 -198.07189 2.0899063 -0.000671191 0.000145189 -0.000000986 



ysa+bx+cx/inx+d/x+e/x'SI .5) 



x 

I 




5 6 
computed pi 



B 



y=a+bexp(-x/c) 



1 




50000 100000 
compuied MW 



150000 



y=a+bx+cx A 2lnx+dx A (2.5)+ex A 3 



y=a+bx A 2lnx+cx A (2.5)+dx A 3 
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Figvrt 2. Plots shoving fits of selected equations (continuous curves) to data on identified proteins (square symbols). (A) p/ computed from 
sequence data versus gel X position for identified spots in F344 rat liver; (B) M t computed from sequence data versus gel Y position for identified 
spots in F344 rat liver; (C) gel X position for spots in B6C3FI mouse liver versus X position in F3443 rat liver, for coelectrophoresing spots; (D) 
gel Y position for spots in B6C3F1 mouse liver versus Y position in F3443 rat liver, for coelectrophoresing spots. In each case, inverse equations 
were also computed (Tabic 2). 
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Figure 3. Master 2-D gel pattern for B6C3F1 mouse liver, standardized using the F3-W rat liver pattern identifications, according to the method 
described in the text. Twenty-nine proteins are identified. 



P^XAT PLASMA = /raT LIVER X-pI (/"lUT PLASMA * LIVER X-RAT LIVER X 

(ZraTPLaSMA X-RAT PLASMA ♦LrVER X C^RAT PLASMa))) 

(8) 

This unified approach, in which one well-populated 2-D 
pattern is used to standardize a family of other patterns, 
has the additional advantage that the resulting pi and M t 
scales are directly compatible. Hence one can compare 
the relative pTs of mouse and rat versions of a se- 
quenced protein in a consistent pi measurement system, 
and select likely inter-species analogs based on posi- 
tional relationships on common scales. Adoption of 
immobilized pH gradient (IPG) technology [4-7] will 
result in . substantial improvements in pi positional 
reproducibility for standard 2-D maps such as those pre- 
sented here; however, we believe that our approach will 
continue to be useful in establishing the empirical pH 
gradient actually achieved by such gels under given 
experimental conditions (temperature, urea concentra- 
tion, etc), in relating patterns run on different IPG 
ranges and using different lots of IPG gels (between 
which some variation will persist). Development of 
rodent organ maps is a continuing effort in our laborato- 
ries [8-10], and results in regular additions of identified 
proteins. Those who wish to receive current rodent liver 
maps, with color annotations, should send a stamped 
self-addressed envelope to the first author. 



We would like to thank the individuals who provided ami* 
bodies mentioned in Table 1, and R. Af. van Frank for un- 
published sequenced data. 
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Introduction 

The advent of large genome sequencing projects has changed the scale of biology. 
Over a relatively short period of time, we have witnessed the elucidation of the 
complete nucleotide sequence for bacteriophage** (Sanger et al % 1982). the nucleotide 
sequence of an eukary otic chromosome (Oliver et aL. 1 992 ). and in the near future will 
see the definition of all open reading frames of some simple organisms, including 
Mycoplasma pneumoniae. Escherichia coli. Saccharomyces cerevisiae. Caenor* 
hahditis ele?ans and Arabidopsis thaliana. Nevertheless, genome sequencing projects 
are not an end in themsleves. In fact, they only represent a stoning point to understand- 
ing the function of an organism. A great challenge that biologists now face is how the 
co-expression of thousands of genes can best be examined under physiological and 
pathophysiological conditions, and how these patterns of expression define an organ- 

There are two approaches that can be used to examine gene expression on a large 
scale. One uses nucleic acid-based technology, the other protein-based technology. 
The most promising nucleic-acid based technology is differential display of mRNA 
( Liang and Pardee. 1992; Bauer et aL. 1993 J. which uses polymerase chain reaction 
with arbitrary primers to generate thousands of cDNA species, each which correspond 
10 an expressed gene or pan of a gene. However, it is currently unclear if this tech- 
nique can be developed to reliably assay the expression of thousands of genes or 
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identify all cDNA species, and the approach doe* doi easily allow a ustematic 
screening. Analysis of gene expression by the study of protein* present in j cell or 
tissue presents a favorable alternative. This can be achieved by use of two-dimensional 
1 2-D > gel electrophoresis, qualitative computer image an ily sis. and protein identifi- 
cation techniques to create 'reference maps* of all detectable proteins. Such reference 
maps establish patterns of normal and abnormal gene expression in the organism, and 
allow the examination of some post-translational protein modifications which are 
functionally imponant for many proteins, h is possible to screen protein* «\ somati- 
cally from reference maps to establish their identities. 

To define protein-based gene expression analysis, the concept of the 'proteome' 
was recently proposed (Wilkinse/a/„ 1995; Wasinger 1995). A proteome ix ihe 
entire PROTein complement expressed by a genOME. or by a cell or tissue type. The 
concept of the proteome has some differences from that of the genome, as while there 
is only one definitive genome of an organism, the proteome is an entity which can 
change under different conditions, and can be dissimilar in different tissues of a sincie 
organism. A proteome nevertheless remains a direct product of a genome. Interest- 
ingly, the number of proteins in a proteome can exceed the number of genes present, 
as protein products expressed by alternative gene splicing or with different posi- 
translational modifications are observed as separate molecules on a 2-D cel. As an 
extrapolation of the concept of the 'genome project*, a 'proteome project* is research 
which seeks to identify and characterise the proteins present in a cell or tissue and 
define their patterns of expression. 

Proteome projects present challenges of a similar magnitude to that of senome 
projects. Technically, the 2-D gel electrophoresis must be reproducible and"of hich 
resolution, allowing the separation and detection of the thousands of proteins in a cell. 
Lou- copy number proteins should be detectable. There should be computer eel imacc 
analysis systems that can qualitatively and quantitatively catalog the electrophoreticallv 
separated proteins, to form reference maps. A range of rapid and reliable techniques 
must be available for the identification and characterisation of proteins. As a conse- * 
quence of a proteome project, protein databases must be assembled that contain 
reference information about proteins: such databases must be linked to Genomic 
databases and protein reference maps. Databases should be widely accessible and easv 
to use. 

Recently, there have been many changes in the techniques and resources available 
for the analysis of proteomes. it is the aim of this chapter to discuss the siaim of the 
areas outlined above, and to review briefly the progress of some current proteome 
projects. 



Two-dimensional electrophoresis of proteomes 

Two dimensional (2-D > gel electrophoresis involves the separation of proteins by their 
isoelectric point in the first dimension, then separation according to molecular weicht 
by sodium dodecyl sulfate electrophoresis in the second dimension. Since first 
described (Klose. 1975: OTarrell. 1975:Scheele, 1 975 Kit has become the method of 
choice for the separation of complex mixtures of proteins, albeit with many modifica- 
tions to the original techniques. 2-D electrophoresis forms the basis of proteome 
projects through separating proteins by their sire and charge (Hochstrasser et aL. 
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1992: Ceiisc7tf/.. 199?: GarTels and Franza. 1989: VanBogcienr/ a/.. 1992). Currcni 
protocol* can resolve two to ihrec thousand protein*, from a complex sample on a 
single eel (Figure ]). 



2-D GEL RESOLUTION AND REPRODUCIBILITY 

A primary challenge of separating complex mixtures of proteins by 2-D gel electro- 
phoresis has been to achieve high resolution and reproducibility. High resolution 
ensures that a maximum of protein species are separated, and high reproducibility is 
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the use of piperazme diacrvlyl as a gc! cro.sslinker and ihe addiiion of thiosulfuie in the 
catalvsi svsiem has been shown 10 21 ve beuer resolution and higher sensitivity 
detection (Hochstrasser and Mem!. 1988: Hochstrasser. Patchomik and MerriL 
1988). 
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Table I: C'-~^»n <ta».»* f° r ceU or hints and ihctr apphcajiunv 
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example, some glycoproteins are noi stained by cooma^sie blue (Goldberg a al„ 
1 9S8 ). and many organic dyes are unsuitable for protein detection on PVDF if samples 
are to be used for direct mainx-assiied laser desorption ionisauon mass spectrometry 
(Simpat ciaL. 1994). 

Although most means of protein detection give some indication of the quantities of 
protein present, in general they cannot be used for global quantitation. This is because 
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no proteii. stain is able conanemly to detect proteins over a u «le ranee of concentre 
1:0ns. isoelectric point* an d amino acid compositions, and with a varien of 
posNtransiauonal modification* < Goldberg eiaL. I98S; Li r/<//.. 1989*. Furthermore, 
there are large differences in staining pattern when identical gels or blo.s are subjected 
in differ :nt <tams. including amido black, imidazole zinc, india ink. ponceau S. 
colloidal gold, or coomassie blue tTovey. Ford and Baldo. J9R7: Ona a aL \W2i 
l*he mo.i common means of quamhating large number- of protein* in a I-D ce! 
involves the radiolabclhng of protein Rample*V«°r to elecirophorests. and'proiein 
quanna.ion based on nuorography and image analysis or houid scintillation counting 
iGarreK 1989: Celts and Olsen. J 994 k However, proteins which do not eomain 
methion«.ie cannot be delected if only pS] methionine is used for !a*eih nc . Ammo 
acid analysis of protein spots visualised by other techniques presents a likelymean* of 
protein quantitation for the future. 



BLOTTING OF PROTEINS TO MEMBRANES 



Electrophoretic blotting of proteins from two-dimensional polyacrylamide eels to 
membranes presents many options for protein identification and microcharactcnsation 
which are noi possible when proteins remain in gels. For example, when proteins are 
blotted to polyvinylidene difluoride i PVDF ) membranes, they can be identified bv N- 
termmal sequencing, amino acid analysis, or immunobiotting. or they may be subjected 
to endoproteinase digestion, monosaccharide analysis, phosphate. analysis, or direct 
matrix-assisted laser desorption ionisation mass specirometrv tMatsudaira. 1987 
Wilkinsr/n/.. 1995: Junghluir/r//.. 1994; Sutton viuL. 1995: Rasmussenr/w/.. 19U4; 
Weizthandler r; aL. 1993: Murthy and lqbal. 1 991: Eckerskorn vt aL. 1992 j. h j v 
possible to combine of some of these procedures on a single protein spot on a PVDF 
membrane t Packer a aL. ) 995: Wilkins a aL submitted: Weizthandler vt u/.. 1 993 ) 
This iv useful when minimal amounts of protein are available for anahsis. These 
techniques will be explored in detail later m this review. Notwithstanding the above, 
there are some disadvantages associated with blotting of pnuein s to membranes. 
There is always loss of sample during blotting procedures 1 Eckerskorn and LotNpcich. 
1993). and common protein detection methods are less sensitive or not applicable 10 
membranes iTablc /1. presenting difnculties for the anaKsis of low abundance 
proteins Detailed discussion of the merits of available membranes and common 
blotting techniques can be found elsewhere ( Eckerskorn and Lotupcich. 1 99 V Strupat 
ci al. m 19^4: Patterson. 1994). 



2-D gel analysis, documentation, and proteome databases 

Following protein electrophoresis and detection, detailed analysis of uel imaces is 
undenaken with computer systems. For proteome projects, the aim of this analysis is 
to catalogue all spots from the 2-D gel in a qualitative and if possible quantitative 
manner, so as to define the number of proteins present and their levels of expression. 
Reference gel images, constructed from one or more gels, form the basis of two- 
dimensional gel databases. These databases also contain protein spot identities and 
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d/taiU of their po ,-transianonjl modifications. 2-D ssl databas-s are besinnm- to V- 
linked to or integrand with comprehensive protein and nucleic arid databases 
(N>idhardt et aL 19R9: Simpson et aL 1992: Appel ei aL 1994j. and orcamsnV 
databases, containing DNA sequence data, chromosomal map locauon.s. reference 2- 
D gels and proiem functional information for an organism, are becominc establi>he"d 
as genome and proieome prqiecu progress (VanBoeelen et aL 1992: Yea<t Proie'n 
Database cited in Garrels e; <•/.. 1994i. 



GEL IMAGE A NA1VS JS AND REFERENCE GELS 

After 2-D electrophoresis and protein visualisation by staining, fluorogruph) or 
phosphonmagine. images of gels are digitised for computer analysis hy'an imaee 
scanner, laser densnomer. or charge-coupled device (CCD) camera (GarTclv 1989: 
Celis ct aL. 1990a: Uruin and Jackson. 1993). All systems digitise gcN with a 
resolution of 1 00 - 200 mm. and can detect a wide range of densities or shadinc 1256 
or more grey scales" ). Following this, gel images are .subjected to a scries of mani- 
pulations to remove venical and horizontal streaking and background haze, to detect 
spot positions and boundaries, and to calculate spot intensity [Figure .*). A standard 
<poi (SSPi number, containing venical and horizontal positional information, i* 
assigned to each detected spot and becomes the protein's reference number. Tahlc 2 
lists some notable software packages which process 2-D gel imase.s. 



Table 2: Some Software Package* fnr'the Analysis of Gel Images 
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CarreK. fv*v. MnrurU.. n uL. ivw^. Holt cut!.. }W2 Cdi*e/«/ 
AmterMtn rr a! IWU. Ru-hard^n. Hurn and Andcr*t»n l^wj 



' Tnrsr reference* arc no; etnaustne thr> inJudr <nme reference* pi u<c .v ucll a* aumnri m ihc 



As there are difficulties in the electrophoresis of sample with 10UQ rcproducibil- 
u>. reference gel images are ofien constructed from man> gcK of the same sample 
• Garrels and Franza. 1 989; Neidhardir/a/.. 19891. Since this invokes the maichtnc of 
2000 to 4000 proteins from one gel to another, it presents a considerable challenge to 
image analysis systems. Matching of gels is usually initiated by an operator. \vho 
manually designates approximately 50 or so prominent spots as "landmarks" on eels 
io be cross-matched. Proteins which match are then established around landmarks, 
using computer-based vector algorithms io extend the matching over the entire gel. 
Close to 1007r of spots from complex samples can be matched by these methods, 
although different degrees of operator intervention may be required fOlsen and Miller. 
1988: Lemkin and Lester. 1989; Garrels. 1989: Myrick et aL. 199?). 
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Figure 3. Computer processing of pel images. Shown is a wide pi range 2-D separation nf human liver 
proteins, processed h> Melanie software ( Appcl ct aL 1WI ». (A) Ongtnal pel image a* captured by laser 
densitometer. (B> Gei image after processing to remove streaking and background lOOutiine definition 
ol all spots on the gel 
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Process w,th pr„u,mw /»r,. /tm :w 
CALCULATION OF PROTEIN ISCILECTRir POINT AND MOl?Cl'LAR UEJGHT 

Estimation of the isoelectric point tpl : and modular weicht r.\f\Vi of nroi-in from 
:-D ztU prides fundamental parameter, for each protein, which air also of use 
during identification procedures . see fallowing section). The pi and MW of proteins 
are recorded in 2-D gel databa-e>. Accurate estimations of protein pi and MW can be 
obtained by using 20 or more known proteins on a reference map to construct standard 
curve* of pi and molecular w eight, which are then used to calculate esiimat-d nl and 
MW of unknown proteins fNeidhardi et aL 19S9: Garrels and Franza 19S9 V™ 
Bogelen. Hutton and Neidhardl. 199C; Anderson and Anderson. 1991- Anderson a 
at 1 99 1. Latham c, a!.. 1 99: ). Aherrmively. the MW 0 f individual proteins blotted 
to P\ DF can be determined very accurately b> direct mass spectromctn cEckcrskurn 
a aL 1992). Where immobilised pH gradients are used, the focusinc position „r 
proteins allow, their pi to be measured within 0.15 units of that calculated from the 
amino acid sequence (Bjellqvistr/fl/.. I993o. It must be noted, however, that protc.ns 
carrying post-translationa! modifications may m i erj , e lP unexpected pi or MW 
position^ dunng electrophoresis (Packer ei aL 1 995 1. 

SPOT Qt'ANTITVTION AN'D EXPRESSION AN ALYSIS 

A major challenge faced in proteomc projects is the quantitative anah sis of proteins 
separated by Z-D electrophoresis. The most accurate means of protein quantnat.on .s 
to determine cnem.cally the amount of each protein present bv amino acid com 
pos.t.onal analysis. However, the current method of choice for quantitative an-ilvsis 
of many proteins .s to radiolabel samples with ("S] methionine or "C amino ae'ids 
perform the Z-D electrophoresis, and measure protein levels in disintegrations per 
minute <dpm> or un.ts of optical density. Quantitation is achieved cither bv |,g U ,d 
scint.llat.on counting, or b> gel .mage analysis where spot densmes are quantised 
b> reiercnce id gel calibration str.ps containing known amounts of radiolabeled 
protc.n or against the integrated opucal density of all spots v.sual.sed . Vandekerkhm e 
<•/ „/.. 1990: Celis a a!.. 1990b: Celis and ON-n. 1994: Garrels. ]9 S 9 Latlv.m 
Garrels and Solter. 1993: Fey ,,„/.. 1994,. All approaches cflccmelv allow ^ lo 
ne normahsed against the lotal d.smtegrations pcr minute loaded on.,, ,|>.. .. c | 
Limitations thai remain with rad.olabelling methods are thai absolmc quantitation is 
no. achieved because all proteins have varying amounts of anv am.no acid and tin. 
only eas.ly labelled samples can be investigated. Quantitative s.her stum,,,., presents 
an alternate .G.omett. ci aL 1991 : Harrington n aL 1992. Rodr.-ucz rial 199V 
Mynck ci al\mi. which when undenaken w„h PSJth.ourca .Wallace and Salu 7 ' 
l^9_ a.bi is of extremely high sensitivity. 

When protein spots from samples prepared under differen. conditions are quantised 
and matched from gel to gel. it becomes possible to examine chanses and patterns , n 
protein expression. Large scale investigation of up. and down-reculation of proteins 
ine.r appearance and disappearance, can be undenaken. For example, simian virus 40 
transformed human keratinocytes were shown to have 1 77 up-resulated and SS down, 
regulated proteins compared to normal keratinoevtes (Celis and OKen 1 994 1 detailed 
synthesis profiles of 1200 proteins have been established in 1 to4 cell mouse embrvos 
iLathamr/,,/.. 1991. 1992): and 4 proteins out of 197] were found to be marker/for 
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cadmium touchy in unnun protein^ rMyrick etaL 1993). Complex dotal chance* 
in protein e\pre«ion a* a result of sene disruptions have also been investigated i S. Fev 
ant? P. Mov -Lar>cn. Personal communication). Impressively, laree sel sets showins 
proiem expression under different conditions can be filohalh invesucated UMnc 
vtai'Mical n ethod^ that find groups of related objects within a set. For example, the 
REF52 rat c-ell line database, consisting of 79 gels from 12 experimental group* where 
each eel c ontains quantitative data for 1 600 crossmatched proiein.%. ha* Keen anal\ *ed 
by cluster analysis fGarrrls ei aL 1990). This revealed clusters of protein- that, for 
example, v ere induced or repressed similarly under simian virus 40 o*. adenovirus 
transformation, suggesting a common mechanism. Protein groups that were induced 
or repressed Junng culture growth to confluence were also found. It is oh\ tous that the 
potential for investigation of cellular control mechanisms by these approaches is 
immense. It is equally clear that investigations of gene expression of this scale are 
currently technically impossible using nucleic-acid based techniques. 

Table 3: Some pmienme Citable* and their special feature* 



I*n iic unc uaianace 



Special feature* 



Reference* 



£ it*;- irrnc-proiein ca'.arasc 



Hunun hean daiarasc* 



Hum.m kcraiiniwir dasabasg 



Mi*u*. emnrxi- Jjtanasc 



Mouse h\cr daiabase 
i NrL-.mnc Proiein 

Mapping Group i 

R.n inrr epithelial datannse 
Incr database 



REr 52 rai eel I hnr database 



>WI5S :DPaCE containing 
Human rnerence maps 



Yc.ki Protein Database (YPDi 
and N cjsi Electrnphorenc 
Pnuctn Database fYEPDv 



Gci ^ptM* imkco with GcnBank 
and kohara doncs: quantitative 
spw measurements under differ- 
ent trrnuth conditions 

Identification t\\ disease markers: 
tun separate database* have 
been established 

Extensive identifications: 
uuantttative spm measurement* 
of lransinrmed cells: idenufica- 
mm hi disease marker* 

Quantitative spot 
tr.rasuremenK through 
! to J cell sutre 
Documents chances due to 
espewure n» mni/tng radiation 
and iosu chemicals 

Detailed subcellular 
iraciionaiton Mudie* 

Extensive sjudiev on reputation 
nt proieins h\ drues and msit 
aeerm 

Accessible via World Wide Web, 
uuanutaine spot measurement* 
under dificrem conditions 

Accessible via World Wide Web. 
completed mtecratcd » ith 
SWISS-PROTand 
SWISS- JDIM AGE 

ComptetcK crossrclerenecd 
organism database. ypD ha* 
extensive information on over 
JttXt proteins; VEPD has 
man\ idemi Heal ions 



VanB^ecier. and Nenihardi. I**vi. 
VanB^eeicnr/<i/.. iv«: 



Baker rtnL 
Crwen r/«/.. I*wah 
Jun;Hui rtnL. ivwa 

Cell* citiL ivsnfo 

Cell* ttaL 

Ceh* and Olscn iwj 

Latham mil. 1WI 
Latham ,y#i/.. }wsn 

Giom.rtii. T.o lor and TolhiUen. 1 W2 



Win:i rr a!.. fs*vi Wirth rr i/.' . |sio: 

Anderson and Anderson. 

Anderson vt «/.. \^ k )2. 

Richard* m. Horn and Andersim. |uya 

GarreK and f-ran/a IVXV 
Bouu-il rni!.. l¥W4 

Appcl rtui.. IVY* 
HocnMrjwcr nul.. 
HuShc* fj ii/.. \ w\ 
Goia/ r; ai. 

Gatrels tt «/.. iss*a 



FEATURES OF PROTEOME DATABASES 



Progress with pminrnw pn; Vi !x 



Pmeome protects rely heavily on computer database* to More information about all 
protein* expressed by an organism. 'Proieome databases' should contain d-uiled 
information of proteins already characterised elsewhere, a, udJ as protein data from 
2-D gels such as apparent pi and MW. expression level under different conditions, 
subcellular localisation. anC information on post-translational modifications lnu\*c* 
of reference 2-D eels. shewing protein SSP numbers and protein idcniificauon, 
should aUo be included. Ideally, proieome databases should be accessible u«h 
Macintosh or IBM persona computers and easy 10 use. Some proieome database* and 
the area* they cover are l^tcd in Table J. Databases range from collections of 
annotated gels to large databases of images integrated with protein and nucleic acid 
sequence banks. 

One example of an integrated proieome databa.se is the .suite of SWISS-PROT 
S WISS-2DPAGE and SWISSoDIMAGE databases t Appel a uL 1 993; Appel a al ' 
1994; Appel Bairoch and Hochstrasser. 1994; Bairoch and Boeckmann. 1904, The 
features of these three databases are listed in Tabic 4. SWISS-PROT SW'ISS- 
2DPAGE and SWISSODIMAGE are accessible through the World Wide Web 



Table 4: Tht SWISS-PROT. $W|S$-a>PACE and SW1SS-3D1MACE suae ..| ^Imkcd daub 
All three datan;.<e* art accessible tnroutfh the World Wide Weh. at URL addre>v hup:// 



?Tpn*\ hcugr.civ 



SWISS-PROT 



swiss-:dpage 



SWISSODIMAGE 



Intunnauon 



Text entries of sequence data: 
Citation information; 
taionomic data. W 
entries in Release 2V 



Annotation* 



Prmein funcintn. 
Po<i iransianonal 
mtidtfications. 
Domainv 

Secondary tirujture. 
Quaternar\ structure. 
Di>cj*cv avto»iaied 
w ah protein. 
Sequence conflict* 

SWISS-2DPAGE 
SWISS-3DIMAGE 
EMBL. PIR. PDB. 
OMIM. PROSITE. 
Medline. Flyhase: 
GCRDh. MaizeDB. 
WonnPep. DicivDB 
Other Features Navigation to other 

SWISS database* achieved 
h\ selccunc entries with 
computer mouse 



Crot*. 

Referenced 

Dat.nnases 



2-D eel imace* of- human 
liver, plasma. HepGl HepG2 
secreted proteins, red blood cell, 
lymphoma, cerebrospinal fluid, 
macrophage like cell line. 
cr>ihroieukemia cell, platelet 
Gel tmapev where 
proicin i* lound. 
Hou protein identified. 
Protein pi and MW. 
protein number: 
nnrmai and pathological 
variant* 



SWISS-PROT and all 
other databases 
nccc^ihlc thmuch 
SWISS-PROT * 



Gel imaces thnu portion 
of identified proteins, or 
region f pel where protein 
should appear 



Collection of }\t\ y.D 
imasrev ot proteins 



Ml anmuaiiitn i s 
•uaiLihlb- in SWISS- 
PROT 



SWJSS-PROT and all 
♦niter datable* 
accc*whlc ihrnueh 
SWISS- PROT * 



Mom* and stereo 
imaeev available. 
Imaye* can be 
transterrcd to local 
computer imace 
viewing procrarnt 
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vLee ei ai. J 992 ). alloum* anv r « 

* mfonnio. and ,m aKs . 10 ,h < l0 ac- 

car be „ec, £ d u, th a computer ^^"^ on * d <££ 
ahc ur ■ proton. mcludme am.no acid seoueTe tdl^ ^ ai,ed ln ^»on 
canon, can be obtained, the prec.se pro, "in s^h, ^ *™**«*imal mod*" 
•nu $e can be viewed if known and i ^ c °r*<POnd> 10 on a r-f-™-. 
••vailable. Reference, , 0 nude^HH i' D Sm,C,Ure of ■* molecule can h * 
,o inforn^on *' « 

PROT becau-t , ht ; ar£ inu K ba«d and rnnt ? GenBank or Swiss 

'Kohara. Ak.yama. and Isono 19S7i rr..„, mJ " 0wJt,on - location on Kohara clo«. 
-eulaton ,nf ormaiJOn t^J^^^ of cenesK X- ^ 

member of region or «, m „, 0 „; A|, " ? P '° n Under oif ^n. crou-.h rf ™' 
-fenced to the SWISS-PROT £ 1^"^ ^BASE are £ 
ani.cpa.ed thai organism databases u iI! , 00 nK Bo « k ™nn. 19*4,?, 

-dab,e mformauon about a pa^^ 

laZ mann?r Uh,ch datable "re a ° K, r ^ " cur ™«»v no 

companions in the future. " Jre Ambled, which max hamper 

Idenuncauon and ch aracteriMtion of prottfjns ^ 

The number of proteins .dentified on a - D r.r 

^ research and reference tool. ^1^7 ? P a , 

~dent,f,ed. a ma|0r ajm of cur ;^ ^ on|v a ^ ; - 

'mm .-D mapv , n order to def.ne then, J i nn T ° " 10 scrccn ,ni ™ Proteins 
'°n., ? ra»on of unknown pro,t, n , Z„ ? ' """ ni " ^uucnc.n' 



Table ?: Hierarcmra) analvtm for mas« srreeninc of * D 
Rapid and ,*,--.„, : „ luue « 3ff ^ „ a ^ ^ e ^^ W,« ttd „. n 



Order 



Aminn acid 3'ul..<n uuh Viermmal sequence i a? 
Peptide-ma^ In cerpriminc 



Combination m am inn acid ar.al>si< and pepnde 
ma** fingerprinting 

Ma« spectrometry sequence taf 

Extensive N-iermwal Eoman micro<eouenctn; 

Imernai peptide Edman m;crn<euuencin$ 

Microveourmrng ►>> ma« spectrometry felcctm. 
^pra> mmtannn. pnst-sciutvc deca> MALDI-TOF» 
Ladder <cuuen fc mp 



Hohohm. H..utturu-3nJSanucr l^u 
Wilkint Mjhmiued 

Mann. Hotrup and RncfK| lV n f « 4 \; 

-Suiion r / |yy5 
CorJweM r/«/.. 1995. 

Mann and Wilm. 1994 
Klai^udaira. I9K7 

Rownfeld ct aL 1992; 
HcJlman r//j/.. 1995. 

Johnson and V\*aUh. 199^ 
Bartlci-Jimcv ct «/.. |vgj 



use of rap.d and cheap identification tools such as "amino t . Th, *»»olvcsihe 
n-s fineerprmun, as first steps in pn^&tfe? ^ 
>Iou - r . more expensive and time consuming identification L! . V hC of 
.heconstructtonofthrshier^ 

of the data created has been considered as wmS^I . ? eandlhcrom P««t.v 
machine „me P er sample, the anak" of da* 1T« ^ liak 
™", Amino a^^^ 

-cnnique. in the hierarchy are discussed in detail lJ^S^^r' r,BaI r 
.dentification techniques ,„ 7 Me c , ee Patler , on ( |9W , £^ * protcn 

PROTEIN IDENTIFICATION- BV AMINO ACID COMPOSITION 

Tne amino acid composition of urateinc k. . ,on 01 P r "'cns in datahascs 
••hromo.ojr.ph.c J£j£ ^g^^T?™^ a„0 
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MSMSSSSSSSSX 

„ ^w y — J • - 

A**: i:.2 Six: 11.4 Ser: S. - ? Hi*: C.7 

Sly: i 4 Thr : 3 - 6 Ala: € .7 Pro: 7.9 

T/r: 1.3 AT 3 • 5.0 Val : t.O MS: C.3 

lie: S.9 L«u: B.O Phe: 13.3 ty»: 4.4 

pi estimate : 6 !9 Rtn^e searched: ( 7.141 
Mw «fit-nAte: 1£S:C Rar.gt searraed: (13440. 20140) 

Clcsext SKI5S-PRIT entries far the species rrc^I marched *y AA creeps i—cn: 
Hank Sccre Protein pi Ascription 



x 24 ?nc_rcoL5 *.m H9if atpartatx carjlakotltrahspcraiz 

: 39 ZZ^ZZZ^Z 6.32 36359 PANTTTHDiATX KINASE (EC 2.7.1.33, 

3 40 KTTr^ZZZ^: 5.06 35713 KOMOSOCOT O- SXTC r INY^TKANrTXPASI 

4 42 CAZr.rm: 5.52 57BI2 TRANSCRIPTIONAL ACTIVATOR CADC. 

s 43 K^rr.rrr-: e.56 19769 kekoiysin c. plaskid. 

Cltsest SWISS-PRC? entries for ETC-I with pi and Mw values in spec t lied 
rar.se r 

Renr. Sccre Prctem pi M* Ascription 
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ASPARTATE CAXAAXOTLTftAffSrSRAtt 

TRAJ PROTEIN. 

HYPOTHETICAL LIPOPROTEIN YAJC. 
HYPOTHETIC 14 .9 Kt> PROTEIN IN CRPE 
. HYPOTHETICAL PROTEIN IN BETT 3 'REGION 



Ficure 4. Computer printout * ri%m EsPASy server where the empirical .tin inn acid companion, 
estimated p! and of 3 protein irnm a reference map off. t**h uerc matched acain*t all cntrte* in 
SWlSS-PROT \%\t£ t The correct identification. aspartate carhnmtn It ran* t erase. 1* vlumn in hold Lou 
s k -irc* indicate j e«»»d match N<uc ho» matching uuhinadclincdpland MW ranee iloucr mm oi protein*! 
h;>* ereaiU increased th? score dttterencc heiuccn the first and second ranking protein^ Tin* s wt »re 
dmerence $ne* ho-'h fc »mtidcnec in the identification, and 1* onK observed where the top ranking protein 
the iorrcci identification iWilLms ct ul.. IW5i, 

graph) -based anal> ms. Protein^ blotted 10 PVDF membranes can be hydrolyscd in 1 h 
ul I55*C. amino acid^ extracted jn a single brief step, and each sample automatically 
dcrivatised and separated by chromatography in under 40 minuter (Wilkins ct al.. 
1 995 : Ou ct al.. 1 995 1. In this manner, one operator can routinely analyse 1 (X) proteins 
per week on one HPLC unit. This technology lends itself to automation, and it is 
anticipated that instruments with even greater sample throughput will be developed. 
W hen proteins have been prepared by micropreparative 2-D electrophoresis (Hanash 
ct al.. 1991: Ehellqvist ct al.. 1993b l. blotted to a PVDF membrane and stained wtth 
amtdo black, any visible protein spot is of sufficient quantity lor amino acid analysis 
t Cord well ct al.. 1995; Wasinger ct al.. 1995; Wilkins a al.. 1995). 

After the amino acid composition of a protein has been determined, computer 
programs are used to match it against the calculated compositions of proteins in 
databases (Eckerskom ct al.. 1988; Sibbald. Sommerfeldt and Argos. 1991 ; Jungblut 
<•/ al.. 1992; Shaw. 1993: Hobohm. Houihaeve and Sander. 1994; Wilkins a al.. 
1995 1. Matching is usually done with only 15 or 16 amino acids, as cysteine and 



Progress w ith pmtcoivz [ut*n\ t% 
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MtSSttStllftt 



A5x: 5.'- Glx: ICS Str: 4.1 Hit: 2.* 

Siy: 12.2 %r: 28 U»: 12.9 Pro: 3.2 

T.-r: Ara: 3.7 V*l: 5.5 Met: C.€ 

lit: S.i Uu: 8.2 Ph«: 3.2 Ly»: 4.9 

?: lf: ^t e: E.99 Jungt starched: ( 5.74. €.24) 

Mw ii::rr.t 45000 iUr.jt « termed: (34000. 54000) 

Clostit ^::SS-PRrr cr.tra«i fcr rc=-: with pj and Mw values ir. *?t=:.*iej 
rar.sa : 
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Fipure 5. A PVDF protein cpoi Irom an £ 2-D reierenee map wj^ sequenced lor ^ eveics. ant) the 
same sample men sumc*! to ammo aeid anaixsis. The N-tcnmnal sentience uas M I K R When the ammo 
acid cnmptKiuiffl til the spnu as well a* estimated pi and MW, were matched against all cntric* in SWISS- 
PR OT h»r£ t nU. the nhmc list of best matches w;i* produced. N-tcnutnal sequent arc Iri'wi SWISS-PROT 
for those entries The top ranking identification nf serine hvdro*ymcthyltransi erase (hold id al nut show a 
larce swore dilterenee between the first and second rankms proteins, jiwm; kittle contidcnec in ihi% heme 
the* correct protein identification. However, the sequence lactMLKRi confirmed the idemiiv o| ihe 
protein :\> <ennc hvdrotymeihvhransienisc 



tryptophan are destroyed during hydrolysis, asparaginc and ghnaminc arc dcumidaicd 
to their corresponding acids, and proline is noi quamhatcd in some analysis systems. 
The computer programs produce a list of best matching protein v which arc ranked by 
a score that indicates the match quality. Some programs allow matching to be 
restricted to specific windows* of MW and pi (Hobohm. Hcmthacvc and Sander. 
1994; Wilkms ct aL. 1 995 1. and to protein database entries for one species (Junghlui 
ct nl.. 1992: Wilkms ct aL. 1995>. The use of such restrictions increases the power of 
matching An example of protein identification by amino acid composition is shown 
in Fii'urc 4. To date, ammo acid composition ha** been used to identify proteins from 
reference maps of Sptroplasnw mcUifcrwn. Mycoplasma prnntthum. E. coli. Saicha- 
mntwe.s cerevtsiac. Dicryaxteliutn dtscoideum. human sera, human heart, human 
lymphocyte, and mouse brain (Cordwell ct aL. 1995: Wasincer ct aL. 1995: Wilkms 
ct aL 1995: Jungblut ct aL 1991 1994: Garrels ct aL 1994: Frey ct aL 1994). 



PROTEIN IDENTIFICATION BY AMINO ACID COMPOSITION AND N-TCRMIKAl 
SEQl'ZSCE TAG 

When samples from 2-D gel.s are not unambiguously identified by amino acid 
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c imposition, p! an-! M\V. often ihe correct idemifintm of that protein »* anions^ the 
I'.'P rankings of the list i Hobohm. Houthaeve and Sander. ! 994; Corduell a u Ll 99* 
Wilkins c: uL !«95i. Taking advantage of thi- observation, a? have used the mas* 
spectrometry sequence tag* concept iMann and Wil n. 1904, m developing a com- 
hncd Edman degradation and amino acid analysix approach in protein identification 
i Wilkin* cj uL suhmmedi. This involve* the N-termm •! sequencing of PVDF-Mcucd 
protein* b\ Edman degradation for ? or 4 cycles to create a Sequence tag*, lollou 
which the same sample i* u *cd for amino acid analysis. A* only a feu ammo acid* are 
♦cmoved Irnm the protein, is composition is not significant!- altered. Furthermore, 
vince onl> a *mall amount of protein sequence is required, fa*t hut low repcume > ield 
T Jman degradation cycle* can be used. Modification* to current procedure* *hould 
alio* 3 c> cle* in be completed in I h. thereby allowing the screening of l(K> or more 
proteins per week on one automated, multi-cartridge sequenator. Amino acid compo- 
sition, pi and MW of proteins are matched against databases as described aho\e. and 
N-terminal sequence* of best matching protein* are checked with the 'sequence lac* 
to confirm the protein identity {Figure 5j. This technique will be les* useful when 
protein* are N-termiiulK blocked, but as only a few N-tcrminal amino acid* are 
*u*centible to the acetyl, formyl. or pyroglutamyl modifications thai cause blockage, 
thi* may n*elf provide useful information for sequence tag identification. A *trencth 
of N-terminal sequence tag and ammo acid composition protein identification i* that 
data generated are uuiekh and easil\ interpreted. 



PROTEIN IDENTIFICATION BY PEPTIDE MASS FINGERPRINTING 

Techniques for the identification of proteins by peptide mass .fingerpriminc have 
recently been described (Henzel ci ul:. 1993: Pappin. Hojrup and Blcasbyf 1993; 
James a at.. 199?: Mann. Hoirup and Roepstorff. 199?: Yates ci aL. 1993: Monz a 
a!.. 19W-1: Simon vial.. 1995i. Thi* involves the generation of peptide* from proteins 
u*mg rc*iduc-spejjfic enzymes, the determination of peptide ma**e*. and the match- 
ing of these ma**e* against theoretical peptide librane* generated from protein 
sequence databases As proteins have different amino acid sequences, their peptides 
>hould produce characteristic 'fingerprints*. 

The first step of peptide mass fingerprinting is protein digestion. Proteins * uhm the 
eel matrix or bound loPVDF can hecnzymaticalK digested m urn. although//; w/wjcl 
digests arc reported to produce more enzyme autodtge*iu>n products, which compli- 
cate *ub*equent peptide mass analyse James a al.. 199?: Rasmussen ct <//.. 1994: 
Monz cf al.. 199-lj The enzyme of choice for digestion is currently trypsin tof 
modified sequencing grade ). but other enzymes ( Lys-C orJ. aunuts VK protease > have 
aI*o been u*ed tPappm. Hqirup and Bleasby. I99?i. To maximise the number of 
peptides obtained, it is desirable for protein samples to be reduced and alkylated prior 
to digestion (Monz ct al.. 1994: Henzel ct «/.. 199?). Thi* ensures that ail disulfide 
bonds of the protein are broken, and produce* protein conformation* that are mure 
amenable to digestion. Surprising!), chemical digestion method* such as cyanoeen 
bromide t methionine specific i. formic acid (aspanic acid specific!, and 2-^2- 
nitrophenyKuIfenyh.?.methyI-?-bromoindolenine (tryptophan specific) have not 
been explored as means of peptide production for mass fingerprinting, even though 
they are rapid and may circumvent some problems associated with enzyme digestions 
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A number of computer programs are available for matching peptide ma^es a<\iw«i 
databases (reviewed in Cottrell. 1994). Matching i> usually undenaken in an inters, 
live manner, whereby peak* of mass 50CU5000 Da are selected and matched under 
• various search parameters including MW of protein, mas* accuracy of peptides and 
number of missed enzyme cleavages allowed (Henzd etaL 1995; Mon2 ctaL. 1904; 
Rasmussen ei al.. 1 994 >. The correct protein identity is the protein u hich hjx the most 
peptide masse* in common with the unknown sampic. Identity have been established 
with as few as three peptides, but unambiguous idemificaticn is thought to require a 
mass spectrometry map covering most peptides of the protein (Monz c; uL. 
Yaies et al.. 1993). To date, peptide mass fmgerprinurg of protein* ha. been 
undertaken from the human myocardial protein andkeratinocyte maps. from an£. ioh 
;-D gel. and from reference maps of Spimplasma mclhu-rum and Myroplusnw 
ftcnnalium (Sutton ei oL. 1 995; Rasmussen ei at.. J 994: Henzel a al.. 1 995; Cordwell 
ei a/„ 1995. Wasinger et al. 1995k although the technique i* most powerful when 
used in combination with another protein identification technique (Rasmus^n ct al 
1994; Cordwell et aL. I995i. 



MASS SPECTROMETRY SEQUENCE TAGGING 

An extension of peptide mass fingerprinting has recently been described, called 
peptide sequence tagging (Mann and Wilm. 1994; Mann. 1995). This u%c> tandem 
mas* spectrometry i MS/MS ) to initially determine the mass of peptides, then subject 
them to fragmentation by collision with a gas. and finally determine the mass of 
fragments. The resulting spectra gives information about a peptide's amino acid 
sequence. The fragmentation masses of peptides can rarely be used to assicn a complete 
sequence, hut it usually allows a short 'sequence tag' of 2 or ? amino acids to be 
determined. This sequence tag and the original peptide mass is matched by computer 
against a database, providing a likely identity of the peptide and the protein it came from. 
The major drawback for this technique as a mass screening tool is the complexitv ol the 
mass data generated and the high level of expertise required for its interpretation. 
Nevertheless, ii represents a useful new protein identification method which urcatlv 
increases the power of peptide mass fingerprinting protein identification. 

Cross-species protein identification 

Protein sequence databases continue to grow at a rapid rate. \ei it not widch 
appreciated that close to907f of all information contained in current prokMn databases 
comes from onl\ 10 species t A. Bairoch. Perv Comm.j. Fonunaieh. this information 
can be used to study proteomes of organisms that arc poori> denned at the molecular 
level, via 2-D electrophoresis and -cross-species' protein identification (Cordwell vt 
al.. 1995; Wasmger r/r//.. 1995i. This approach allow* proteins from reference maps 
of many different species to be identified without the need forth: corresponding tjencs 
to be cloned and sequenced. This is particularly true for 'housekeeping* proteins^ such 
as enzymes involved in glycolysis. DNA manipulation and protein manufacture, 
which are highly conserved across species boundaries. Proteins that cannot be 
identified across species boundaries can then become the focus of further protein 
characterisation and DNA sequencing efforts. 
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The staixs of proteome projects 

Many technical aspects of proteome research have already b~n di;«n^ ■ v 
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The study of venebrate proteomes and venebrate de vclonm-ni is a nr™ 

because vast numbersof proteins are developmental* expressed ea S bodv i k " 
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FUTVRE DIREmOVS Or PROT ONie frKOJECTS 

Thi* review ha.* de scribed recent advances in ihe urea of proieome research, h has 
illustrated hou new development* ofoldei techniques ^Delectrophores-^nd amino 
acid analysis i a* well as ihe applications o; new* technology t mass spectrometry ) have 
greatly widened the choice of tools the biologist and protein chemist ha* for the 
reparation, identification and analysis of complex mixture* of proteins. ThU ha< made 
possible the establishment of detailed reference maps for organisms, which are 
becoming the method of choice for tht definition of tissue* or whole cells. and the 
investigation of gene expression therein. 

Proieome projects are already impacting on the dogma of molecular biolosr\ that 
DN A sequence constitutes the definition ot an organism. For example, the prote'omes Ari 
of different tissues of a single organism are often significantly different. Similarly, 
cross-species identification of proteins (for example the identification of proteins 
from Candida albicans by comparison with 5. cerevisiac) can open up studies on 
organisms that are poorly moiecularly defined. As cross-species identification can 

proceed at a pace orders of magnitude faster than a genome project in term> of A,TI 
defining the gene and protein complement of organims. the need for the DNA 

sequencing of genomes will be avoided, and emphasis.placed on those found to be d . t 

novel. 

Just as genome sequencing is not an end in itself, neither is an annotated 2-D protein B\k 
reference map of an organism, nor indeed the identification of proteins in a proteome. 
So whilst an immediate aim of proieome projects is to screen proteins in reference 
maps, this will lead to expression studies and characterisation of post-translational 
modifications. The challenge that then needs to be addressed is the investigation of 

structure and function of proteins in a proteome. The magnitude of this is illustrated hv Bah 
the fact thai over half the open reading frames identified in 5. cerevisiac chromosome 
111 w-re initially of no known function (Oliver?/ ai. 1992). Structural and functional 
Niudies u ill be an undertaking just as formidable as genome studies are now and 
proteome projects are becoming, bui will lead to an unimaginably detailed under- Biri 
standing of hou living organisms are constructed and how they operate. 
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ABSTRACT Analysis of cellular protein patterns by 
computer-aided 2-dimensional gel electrophoresis together 
with recent advances in protein sequence analysis have 
made possible the establishment of comprehensive 
2-dimensional gel protein databases that may link pro- 
tein and DXA information and that offer a global ap- 
proach to the study of the cell. Using the integrated ap- 
proach offered by 2-dimensional gel protein databases it 
is now possible to reveal phenotype specific protein (or 
proteins), to microsequence them, to search for homology 
with previously identified proteins, to clone the cDNAs, 
to assign partial protein sequence to genes for which the 
full DNA sequence and the chromosome location is 
known, and to study the regulatory properties and func- 
tion of groups of proteins that are coordinately expressed 
in a given biological process. Human 2-dimensional gel 
protein databases are becoming increasingly important in 
view of the concerted effort to map and sequence the en- 
tire genome. Celis, J. E.; Rasmussen, H. H.; Leffers, 

H.: Madsen. P.; Honore, B.; Gesser, B.; Dejgaard, K.; 
Vandekerckhove. J. Human cellular protein patterns and 
their link to genome DNA sequence data: usefulness of 
two-dimensional gel electrophoresis and microsequencing. 
FASEB J, 5: 2200-2208; 1991. 

AY; Words' numan protein patterns • 2 -dimensional gel protein 
databases • gene expression • microsequencing • cDXA cloning 
* linking protein and DXA information • genome mapping and se- 
quencing 

Proteins synthesized from information contained in the 
DNA orchestrate most cellular functions. The total number 
oi proteins svnthesized by a typical human cell is unknown 
although current estimates range from 3000 to 6000 Of 
these, as manv as 70% may perform household functions 
and are expected to be shared by all cell types irrespective of 
their origin. There are many different cell types in the hu- 
man body with perhaps 30,000 to 50.000 proteins expressed 
in the organism as a whole judged from the fact that about 
3T( of the haploid genome correspond to genes. Today only 
a small fraction of the total set of proteins has been identified, 
and little is known about the protein patterns of individual 
cell types or their variation under physiological and abnor- 
mal conditions. 

For the past 15 years, high resolution 2-dimensional gel 
electrophoresis has been the technique of choice to deter- 
mine the protein composition of a given cell type and for 
monitoring changes in gene activity through quantitative 
and qualitative analysis of the thousands of proteins that or- 
chestrate various cellular functions (refs 1-6 and references 



therein). The technique originallv described by OTarrell i 
separates proteins in terms of their isoelectric point ^pl) ar. 
molecular weight. Usually one chooses a condition of in- 
terest and the cell reveals the global protein behavioral 
response as all detected proteins can be analyzed both 
qualitatively and quantitatively in relation to each other. A: 
present, most available 2-dimensionai gel techniques (regu- 
lar gel format'! can resolve between 1000 and 2000 protein> 
from a given mammalian cell tvpe. a number that cor- 
responds to about 2 million base pairs of coded DNA. Les* 
abundant proteins can be detected by analvzing partial] 
purified cellular fractions. 

Two-dimensional gel ectrophoresis has been widely applied 
to analvsis of cellular protein patterns from bacteria to mam- 
malian cells (refs 1-6. and references therein). In spite of 
much work, however, information gathered from these 
studies has not reached the scientific community in its full- 
ness because of lack of standardized gel systems and the lack 
of means for storing and communicating protein informa- 
tion. Only recently, because of the development of appropri- 
ate computer software (7-13). has it been possible to scar 
gels, assign numbers to individual proteins, and store tht 
wealth of information in quantitative and qualitative com- 
prehensive 2-dimensional gel protein databases (4, 14-23). 
i.e.. those containing information about the various proper- 
ties (physical, chemical, biological, biochemical, physiologi- 
cal, genetic, immunological, architectural, etc.) of all the 
proteins that can be detected in a given cell type. Such in- 
tegrated 2-dimensional gel protein databases offer an easy 
and standardized medium in which to store and communi- 
cate protein information and provide a unique framework in 
which to focus a multidisciplinarv approach to study the cell. 
Once a protein is identified in the database, all of the infor- 
mation accumulated can be easilv retrieved and made availa- 
ble to the researcher. In the long run, protein databases are 
expected to foster a wide-varietv of biological information 
that may be instrumental to researchers working in many 
areas of biology — among others, cancer and oncogene 
studies, differentiation, development, drug development and 
testing, genetic variation, and diagnosis of genetic and clini- 
cal diseases (Fig. 1) 

The approach using systematic 2-dimensional gel protein 
analysis has recentlv gained a new dimension with the ad- 
vent of techniques to microsequence major proteins recorded 
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Figure 1. Interlace between partial protein sequence databases, 
comprehensive 2-dimensional gel databases, and the human ge- 
nome sequencing project. Appropriate software is required to com- 
pare protein and DNA sequences. In general, although the infer- 
ence of a protein's sequence from the DNA sequence ^thick arrow "i 
is direct and unambiguous, the DNA sequence can only be inferred 
approximately from the protein sequence (thin arrow) and cloning 
if the gene requires either a cDNA or the requisite group of 
oligonucleotide probes deduced from the partial amino acid se- 
quence. Modified from re: 6. 

in the databases (refs 24-42 and references therein). Partial 
protein sequences can be used to search for protein identity 
as well as to prepare specific DNA probes for cloning as-yet- 
uncharacterized proteins (Fig. 1). As these sequences can be 
stored in the database (see for example Fig. 2H). they offer 
i unique opportunity to link information on proteins with 
he existing or forthcoming DNA sequence data on the hu- 
man genome (Fig. 1) (20. 36. 39). 

Using the integrated approach offered by comprehensive 
2-dimensional gel databases (Fig. l)r, it will be possible to 
identify phenotype-specific proteins: microsequence them 
and store the information in the database: search for homol- 
ogy with previously characterized proteins: clone the 
cDNAs. assign partial protein sequences to genes for which 
the full DNA sequence and the chromosome location are 
known, and study the regulatory properties and function of 
groups of proteins (pathways, organelles, etc.) that are coor- 
dinate^- expressed in a given biological process. Comprehen- 
sive 2-dimensional gel protein databases will depict an in- 
tegrated picture of the expression levels and properties of the 
thousands of protein components of organelles, pathways, 
and cytoskeletal systems in both physiological and abnormal 
conditions and are expected to lead to identification of new 
regulatory networks in different cell types and organisms. In 
the future. 2-dimensional gel protein databases may be 
linked to each other as well as to national and international 
specialized databanks on nucleic acid and protein sequences, 
protein structures. NMR experimental data, complex carbo- 
hvdrates. etc. 

A few 2-dimensional gel protein databases that are accessible 
in a computer form have been published in extenso: these 
correspond to the protein-gene database of Escherichia coii 
K-12 developed by Neidhardt and colleagues (14. 23), the rat 
REF 52 database established by Garrels and co-workers at 
Cold Spring Harbor (18. 22). and a few human databases 
(transformed amnion cells [15. 20]. normal embryonal lung 
MRC-5 fibroblasts [17. 21). keratinocytes (19] and peripheral 
blood mononuclear cells [15)) developed in Aarhus. Given 
space limitations and to keep this review in focus, we will 
concentrate on the computerized analysis of human cellular 
2-dimensional gel patterns, and in particular on the steps in- 
volved in establishing comprehensive 2-dimensional gel 
databases that can link protein and DNA information. 



MAKING AND MANAGING A COMPREHENSIVE 
2-DIMENSIONAL GEL DATABASE OF HUMAN 
CELLULAR PROTEINS 

The first step in making a comprehensive 2*dim?n$uma! c - 
protein database is to prepare a svnthetic image idigira: :r: ::: 
or" the gel image) of the gel ^tiuoroeram. Coomassie biue or m-- 
ver stained gel) to be used as a standard or master referents. 
This can be done with laser scanners, charge couple device 
(CCDV array scanners, television cameras, rotating drurv. 
scanners, and multiwire chambers il3). Computerized anal- 
ysis systems for spot detection, quantitation, pattern match- 
ing, and data handling (access and retrieval of information, 
database making) have been described in ;he literal tire 
(ELSIE [43]. GELLAB (11). HERMeS [44';, MELA.ME 
[10). QUEST (9\ and TYCHO [8]) and some are available 
commercially (PDQUEST. Protein Database Inc.. Hunting- 
ton. N.V.: KEPLER, Large Scale Biology, Rockville. Md.; 
Visage. Biolmage Corporation. Ann Arbor. Mich.: Gemini. 
Joyce Loebl, Gateshead: Microscan 1000. Technology 
Resources Inc.. Nashville. Tenn. and MastcrScan, Billerica. 
Mass.). Unfortunately, most of these systems are incompati- 
ble with one another and their advantages and disadvantages 
have been discussed by Miller (13V 

In our work station in Aarhus, fiuorograms arc scanned 
with a Molecular Dynamics laser scanner and the data are 
analyzed using the PDQUEST II software (Protein Data- 
bases Inc.) (12) running on a spark station computer 4100 
FC-8-P3 from SUN Microsystems. Inc. The scanner meas- 
ures intensity in the range of 0-2.0 absorbance. A typical 
scan of a 17 x 17 cm fluorograrn takes about 2 min. Steps 
in image analysis include: initial smoothing, background 
substraction, final smoothing, spot detection, and fitting of 
ideal Gaussian distribution to spot centers. Spot intensity is 
calculated as the integration of a fitted Gaussian. If calibra- 
tion strips containing individual segments of a known 
amount of radioactivity are used, it is possible to merge mul- 
tiple exposures of the sample image into a single data image 
of greater dvnamic range. Once the synthetic image is 
created it can be stored on disk and displaved directly on the 
monitor. Functions that can be used to edit the images in- 
clude: cancel (for example, to erase scratches that may have 
been interpreted as spots by the computer; cancel streaks or 
low dpm spots), combine (sometimes a spot may be resolved 
into several closely packed spots), restore, uncombine. and 
add spot to the gel. The process is time consuming — about 
1-1/2 day per image. Edited standard images can be matched 
to other svnthetic images. Figure 2.4 shows a portion of a 
standard svnthetic image (IEF) of a Huorograrn of 
[ 35 S)methionine labeled cellular proteins from human AM A 
cells (master database) (20). Images can be displayed either 
in black and white (resembling the original rluorograms) or 
in color \ other images in Fig. 2). depending on the need. As 
shown in Fig. 2B, each polvpeptide is assigned a number by 
the computer, which facilitates the entry and retrieval of 
qualitative and quantitative information for any given spot 
in the gel (20). The standard image can be matched auto- 
matically by the computer to other standard or reference gels 
(Fig. 2C matching of AM A cellular proteins [left) to MRC-5 
proteins (right)) provided a few landmark spots are given 
manually as reference (indicated with a * in Fig. 2C) to in- 
itiate the process. 



^Abbreviations: CCD. charge couple device. PCNA. proliferat- 
ing ceil nuclear antigen; HPLC. high performance liquid chromatog- 
raphy. 




Figure 2. A) Synthetic image of a fraction of an IEF gel of the master irnaec of AM A cellular proteins A'i As in A bui showing numbers 
aliened to each spot. Q Comparison of AM A (left) and normal human cmbrvonal lunc MRCo fibroblasts « nam i IEF proteins patterns. 
Matched proteins are indicated by a * or by the same letters in both gels Once a protein is matched, information contained in the various 
categories available in the master A MA database can be transferred. D\ Svnthetit imaee oi a traction of an IEF Huorucram of ["SJmcthiu- 
mne labeled proteins from normal human MRCo fibroblasts. The histograms shou level* of svnthesis oi a leu proteins in MRCo Heft 
nan and S\ 40 transformed MRC-5 (right bar) fibroblasts. E) Polypeptides thai contain information under the catceorv glvcoivtic pathway. 
F\ The function peruse annotation for spot allows the operator to inquire about cateuonc^ and information available (or a eiven protein. 
(*i Relative abundance of cytoskeletal and cytoskeletal-related proteins in quiescent, proliferating, and S\'4H-transformed MRCo fibrob- 
lasts. H) Polypeptides that contain information under the category partial amino acid ^auences 
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The automatic matching process that has been described 
in detail by Garreis et aJ. (12) takes about 5 min. Matched 
proteins are indicated with the same letters in both gels (Fig. 
2C). The usefulness of this function is emphasized by the fact 
that data accumulated on common household* proteins can 
be easily transferred to any other human cellular cell type 
whose 2-dimensional gel cellular protein pattern is matched 



to our standard AMA 2-dimensional gel protein image. Al- 
ternatively, if the standard gel is part of a matchset (set of 
gels in a given experiment) it can be used as a linker gel to 
compare, for example, the quantitative values of a given pro- 
tein throughout the experiment (see Fig. 2Z); levels of some 
proteins in normal and SV40 transformed human MRC-5 
fibroblasts) or with other standard images in different sets of 
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cross-matched experiments (18, 22). 

Once a standard map of a given protein sample is made, 
one can enter qualitative annotations to make a reference 
database. Our master 2-dimensional gel database of trans- 
formed human amnion cell (AMA) proteins (20) lists 3430 
polypeptides of which 2592 correspond to cellular compo- 
nents, having pi's ranging from 4 to 13 and molecular 
weights between 8.5 and 230 kDa. The most abundant pro- 
teins in the database correspond to total actin (3.87% of total 
protein; about 90 million molecules per cell) while the 
lesser abundant of the recorded polypeptides are present in 
the vicinity of 5000 molecules per cell. Some annotation 
categories we are using to establish the master AMA data- 
base include: 7) protein identification (comigration with 
purified proteins, 2-dimensional immunoblouing, microse- 
quencing); 2) amounts (total amounts and levels of synthe- 
sis); 3) subcellular localization (nuclear, cytoskeletal, mem- 
brane, membrane receptors, specific organelles, etc.); 4) 
antibodies; 5) posttranslational modifications (phosphoryla- 
tion, glycosylation, methylation etc.); 6") rnicrosequencing; 7) 
cell cycle specificity (specific variations in levels of synthesis 
and amount); 8) regulatory behavior (effect of hormones, 
growth factors, heat shock, etc.) 9) rate of synthesis in nor- 
mal and transformed cells (proliferation sensitive proteins, 
cell cycle specific proteins, oncogenes, components of the 
pathway (or pathways) that control cell proliferation); 10) 
function (mainly from comigration with proteins of known 
function); 11) sets of proteins that are coordinately regulated 
(hierarchy of controls, differential gene expression in various 
cells, etc.); 12) cDNAs (cloned cDNAs); 13) proteins that are 
specific to a given disease (systematic comparison of protein 
patterns of fibroblast proteins from healthy and diseased in- 
dividuals); 14) expression and exploitation of transfected 
cDNAs; 15) pathways (metabolic, others); 16) gene localization 
(genetic and physical); 17) effect of microinjected antibody 
on patterns of protein synthesis; and 18) secreted proteins. 

Information entered for any spot in a given annotation 
category can be easily retrieved by asking the computer to 
display the information on the color screen. For example, 
Fig. 2£ shows a synthetic image of a NEPHGE gel (master 
AMA database) displaying the information contained under 
the entry glycolytic pathway. Alternatively, one can use the 
function peruse annotations for spot to directly ask the com- 
puter to list all the entries available for a particular protein. 
By clicking the mouse in a given entry (in this case, presence 
in fetal human tissues) it is possible to take a quick look at 
the information in that particular entry (Fig. 2F). 

A major obstacle encountered in building comprehensive 
2-dimensional gel protein databases is identifying the large 
number of proteins separated by this technology. In our 
databases (20, 21), known proteins are identified by one or 
a combination of the following procedures: 1) comigration 
with known proteins, 2) 2-dimensional gel immunoblouing . 
using specific antibodies, and 3) rnicrosequencing of 
Coomassie Brillant Blue stained human proteins recovered 
from dried 2-dimensional gels (see next section). Protein 
identification by means of rnicrosequencing may be difficult, 
as individual protein members of families with short peptide 
differences may escape detection. In the gene-protein data- 
base of E. colt K-12 (14, 23), another major 2-dimensional gel 
database available at present, proteins are being identified by 
a wider range of tests that include comigration with purified 
proteins; genetic criterion (deletion, insertion, frameshift, 
nonsense, missense, regulatory), plasmid-bearing strains 
and in vitro synthesis of protein; selective labeling (methyla- 
tion, phosphorylation); peptide map similarity; and physio- 
logical criterion and selective derivatization. 



So far we have received nearly 550 antibodies from labora- 
tories ail over the world and these are being systematical^ 
tested by 2-dimensionaJ gel immunoblouing for antigen de- 
termination. Similarly, purified proteins and organelles 
provided by several laboratories have greatly aided iden tinea 
tion of unknown proteins (20?2l). We routinely request anti- 
bodies and protein samples and promise the donors to make 
available all the information we mav have accumulated on thai 
particular protein. For example. Table 1 lists entries availa- 
ble for Lipocortin V (IEF SSP 8216). also known as annexin 
V, VAC-ct, endonexin II, renoconin. chromobindin-5'. an- 
ticoagulant protein. PAP -I, -y-calcimedin, IBC. calphobindin, 
and anchorin CII. 

As mentioned previously, one distinct advantage of 
2-dimensionaJ gel electrophoresis is the possibility of study- 
ing quantitative variations in cellular protein patterns that 
may lead to identification of groups of proteins that are ex- 
pressed coordinately during a given biological process. 
Quantitation, however, is not an easy task as reflected by the 
lack of published data on global cellular protein patterns. We 
believe this is partly due to difficulties in obtaining sets of 
gels that are suitable for computer analysis (streaking, 
materia) remaining at the origin, etc.) as well as to limita- 
tions (laborious editing time, need of calibration strips to 
merge images, limited dynamic range, etc.) in the computer 
analysis systems available at the moment. Perhaps the most 
advanced quantitative studies published so far using com- 
puter analysis have been carried out by Garrels and co- 
workers (18, 22). In particular, these investigators have estab- 
lished a quantitative rat protein database (18, 22) designed 
to study growth control (proliferation, growth inhibitors, and 
stimulation) and transformation in well-defined groups of 
cell lines obtained by transformation of rat REF52 cells with 
SV40, adenovirus, and the Kirsten murine sarcoma virus. 
These studies have revealed clusters of proteins induced or 
repressed during growth to confluence as well as groups of 
transformation-sensitive proteins that respond in a differen- 
tial fashion to transformation by DNA and RNA viruses. A 
most interesting feature of this quantitative database is the 
discovery of a group of coregulated proteins that show simi- 
lar expression patterns as the cell cycle-regulated DNA repli- 
cation protein known as proliferating cell nuclear antigen 
(PCNA)/cyclin (45). 

In our human databases, most quantitations have been 
carried out by estimating the radioactivity contained in the 
polypeptides by direct counting of the gel pieces in a scintil- 
lation counter (20, 21). Up to 700 proteins can be cut out 
through appropriate exposed films in a period of. time com- 
parable to that required for editing a synthetic image. 
Manual quantitation of this large number of spots is difficult 
without the assistance of a master reference image and a 
numbering system that can be used to identify the spots. Us- 
ing this approach, we have recorded quantitative changes in 
the relative abundance of 592 [ 35 S]methionine-labeled pro- 
teins synthesized by quiescent, proliferating, and SV40 
transformed human embryonic lung MRC-5 fibroblasts (21). 
Some data concerning cytoskeletal and cytoskeletal-related 
proteins are presented in Fig. 2G. Our studies as well as 
those of Garrels and co-workers (18, 22) may in the long run 
help define patterns of gene expression that are characteristic 
of the transformed state. 

OTHER 2-DIMENSIONAL GEL PROTEIN 
DATABASES 

As mentioned previously there are other 2-dimensional gel 
databases available in computer form that have been pub- 
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TABLE 1. Some entries for lipocortin Y it. the humcr. AM A 2 -dimensional gel protein database 



Km no lor hpocomn V iIEF SSP S2J6- 



Iniormation entered 



1 Protein name 



1. Percentage of iota! protein 



Lipocortin V. renocortin. chromobindin-5 . endonexin I. anticoaeuian: prote::: 
PAP-I. VaC-q. 35-vcaicimedin. IBC. calphobindin I. anchorin CIl. annex::* V 

0.110£ (about 2.800.000 moiecuJes per cell) 



5 Apparent molecular weight tmn 

4. Isoelectric point (pi) 

5. Method (or methods) of identification 

6. Credit to investigators that aided tn 

identification 

7. Antibody against protein 

8. Cnmigrauon with human proteins 

9. Cellular localization 

Ml. Calciunvphospholipid-cicpendem 
membrane proteins 

I 3 . Function 



33.3 kDa 
4.76 

Microsequencing. 2-dimensionaJ immunoblotting. Comicranon 

G. Bauw. J. Vandekerckhove. and colleagues. Riiksuniversuen Gent; H. IYrn:t*k\. 
BIOGEN. Cambridge: N.G. Ahn. University of Washington 

Poivclonal (rabbit, antibodv no. 20). B. Pepinsky. BIOGEN. Cambridge 

Lipocortin V.N.G. Ahn. Howard Hughes Medical Institute. Washington L*imrr>it\ 

Subcortical membrane 

Lipocortin V 

Regulation of various aspects of inflammation, immune rcspon>e. blood t nayuknion 
and differentiation 



12. Partial amino acid >equence 

I A . cDNA sequence 

14. Levels in fetal human tissues 



1T>. Levels in quiescent, proliferating, and 
transformed MRC-5 fibroblasts 

li). Distribution in Triton supernatant and 
cvtoskeletons 



GTVTDFPGFDER (7-18V VLTEIIASR * 109-1 17^. QVYE EEYGSSLEnOWC; 
(127-143). ?GTDEEKFITIFGT(R) (187-201) 

Known. R. Blake et al../ Biol. Chcm. 263. 10799-10811: 198S 
(pi = 4.76 from translated sequence) 

Adrenal glands * * * * ; brain * - - - : 
cerebellum « * * - ; ear » + - - ; eve = * * - ; 
heart * - * - : hypophysis * ♦ * ; liver ■ * - • : 
lung * - * - : meninges » * - • : 
mesonephric tissue = 

striated muscle = * * - : pancreas = - : 

skin * *• * * ; spleen » * * * ; stomach = * - - ; 

submandibular gland * - * - : 

small intestine = - - ~ : thvmus = * - * : 

thyroid gland - * * * ; tongue * - - - ; 

ureter » * * - 

Q. ^quiescent) = 1.1: P (proliferating) = 1.0; 
T (SV40 transformed) = 0.3 

Mainly supernatant 



lished in extenso. these correspond to the £. colt K-12 
protein-gene database (14. 23) and to the rat REF52 data- 
base 1 18. 22). 

The E. coli K-12 cellular protein-gene database is perhaps 
the most complete of all databases reported so far and even- 
tually it should trace each protein back to its structural gene. 
Information contained in this database includes: gene/pro- 
icin name (protein name, EC number, gene name); 
--dimensional gel spot designations (x-y coordinates from 
reference gels, alphanumeric designation): genetic informa- 
tion (linkage map location, physical map location. Genebank 
code, sequence reference, location on Kohara clones): bi- 
ochemical information (molecular weight, pi. number of 
residues of each amino acid, mole percent of each amino 
acid, total number of amino acids in a polypeptide), and 
regulatory information (cellular level of protein in different 
media and different temperature, member of reguftm. mem- 
ber of stimulon). Major advances of this database are en- 
visaged in the future in view of the eminent sequencing of 



the whole E. coli genome as well as the development of im- 
proved methods to express cloned genes. 

The rat REF52 2-dimensional gel protein database lists 
about 1600 proteins that have been recorded using the 
QUEST analysis system (18, 22). Included in this quantita- 
tive database are 7) protein names (cytoskeletal and heat 
shock proteins as well as various nuclear, mitochondrial, and 
cytoplasmic proteins), 2) annotations { subcellular localiza- 
tion, modification, recognition by specific antibodies, 
coprecipitation. N ^-terminal sequence, cross-reference to 
protein sequence information and references to the litera- 
ture). 3) protein sets (cytoskeletal proteins, phosphoproteins. 
sets of proteins with PCNA/cyclin-like properties, etc.) and 
4) general quantitative data (protein synthesis during growth 
of normal REF52 cells to confluence and quiescence, and af- 
ter restimulation of growth-inhibited cells). 

In addition to the 2-dimensional gel databases mentioned 
so far there are several smaller cellular databases being es- 
tablished in human (normal human diploid fibroblasts, lym- 



phocytcs. leukocytes, leukemic cells) mouse (NIH/3T3 cells, 
T lymphocytes). Aplysia. yeast (Saccharomyces eerevisae). plants 
(wheat, barley sorghum), and Euglena. Databases of tissue 
protein, (brain, whole mouse, liver) and body fluid proteins 
(plasma proteins, cerebrospinal fluid, urine, and milk) are 
being established in several laboratories. The reader is 
directed to the review by Celis et al (4) for details and refer- 
ences concerning these databases. 

MICROSEQUENCING HAS ADDED A NEW 
DIMENSION TO COMPREHENSIVE 
2-DIMENSIONAL GEL DATABASES: A DIRECT 
LINK BETWEEN PROTEINS AND GENES 

The development of highly sensitive amino acid gas-phase or 
liquid-phase sequenators (24), together with the establish- 
ment of efficient protein and peptide sample preparation 
methods, has opened the possibility to perform a systematic 
sequence analysis of proteins resolved by 2-dimensional gel 
electrophoresis. Indeed, generated pieces of protein se- 
quences can be used to search for protein identity (compari- 
son with available sequences stored in databanks) as well as 
for preparing specific DNA probes for cloning of as yet un- 
characterized proteins (Fig. 1). In addition, partial protein 
sequences can be stored in 2-dimensional gel databases (for 
example, see Fig. 2H) and offer a unique link between pro- 
teins and genes (Fig. 1). 

In the early 1970s gel electrophoresis was used to purify 
proteins for sequencing purposes (reviewed by W^eber and 
Osborn in ref 25). Proteins were recovered by diffusion and 
sequenced by the manual dansyl-Edman degradation at the 
nanomole level. This iechnique was further refined by using 
electro-elution to recover proteins and by miniaturizing the 
system (26). This method has been used extensively, but 
snowed increasing drawbacks (low yields, protein samples 
contaminated by free amino acids, and NH 2 -terminal block- 
ing) as the amounts of handled protein gradually became 
smaller (e.g., at the 10 picomol level). 

Most of the problems referred to above have been 
minimized with the introduction of protein-electroblotting 
procedures (27-32). When proteins are blotted on chemi- 
cally inert membranes, it is possible to sequence the immobi- 
lized proteins directly without additional manipulations. 
Thus, depending on the amount of bound protein and its na- 
ture, this direct sequencing procedure generally yields NH : - 
terminal sequences containing 10-40 residues. As such, this 
technique was used to identify, by their NHj-terminal se- 
quences, differentially expressed major proteins from total 
cellular extracts separated on 2-dimensional gels. A major 
difficulty encountered in this procedure is the occurrence of 
frequent artefactual blockage of the proteins. Several studies 
suggest that this phenomenon is mainly due to reaction with 
contaminants (particularly unpolymerized acrylamide 
present in the gel) and to a high dilution of the protein (low 
concentration of the protein per unit membrane surface). In 
addition to this primarily technical problem, many proteins 
are blocked in vivo by acylation or by a pyrrolidon carboxylic 
acid cap. 

The problem of partial or complete NH 2 -terminal block- 
age can be circumvented by generating internal amino acid 
sequences. This is achieved by fragmenting the protein 
present in the gel (gel in situ cleavage) or by cleaving it while 
bound to the membrane (membrane in situ cleavage) 
(33-35). In both cases, proteins are either cleaved in a res- 
tricted way (e.g., by limited enzymatic digestion or by using 
restriction chemical cleavage conditions) or fragmented into 
smaller peptides. 



Of -the different combinations examined, we had 
results by using exhaustive proteolytic digestion o> 
membrane-immobilized proteins. This method has brrr 
described for Ponceau red-stained proteins on nitroceliuiuv 
blots (34). for Amido-black^stained Immobilon-bounc pr. 
teins. and for fiuorescamine^detected proteins on glass ni\ 
membranes (35). The proteases used (trypsin, chymotrypsn. 
or pepsin) cleave at multiple sites, generating small peptide.* 
that elute from the blot into the digestion buffer from which 
they are purified bv reversed-phase high performance liquid 
chromatography (HPLC) before being sequenced individu- 
ally Although each of these manipulations could be expected 
to result in a reduced yield of final sequence information, we 
were surprised that the peptides could be sequenced with 
high efficiency. In our hands, this approach could be rou- 
tinely applied to gel-purified proteins available in amount- 
ranging from 5 to 10 jig, and often yielded sequence informa- 
tion covering more than 307r of the total protein. As 
membrane-immobilized proteins are not homogeneouslv 
digested, but rather show protease sensitivity next to resis- 
tant regions, the number of peptides generated is much lower 
than expected from the number of potential cleavage sites. 
Consequently. HPLC peptide chromatograms arc less com- 
plex and most peptides can be recovered in pure form. 

As only limited amounts of a protein mixture can be- 
loaded on a 2-dimensional gel. proteins of interest are often 
obtained in yields insufficient for the currently available se- 
quencing technology. More material can be obtained by en- 
riching for a certain subcellular fraction (purified cell or- 
ganelles) or by exploiting affinity (dyes, metals, drugs, etc) or 
hydrophobic properties of proteins before gel analysis. All of 
the sequencing results accumulated so far in the human pro- 
tein database (20) (a few are shown in Fig. 2H) have been 
obtained from analysis of protein spots collected from 
2-dimensional gels that had been stained with Coomassie 
blue according to standard procedures and dried for storage. 
Proteins are recovered from the collected gel pieces by a 
protein-elution-concentration device, combined with gel 
electrophoresis and electroblotting. Details of this technique 
have been reported in a previous communication (42) and a 
brief outline is given below. 

Combined gel pieces are allowed to swell in gel sample 
buffer (a total volume of 1.5 ml). The gel pieces combined 
with the supernatant are then collected into a large slot made 
in a new gel. The slot is further filled with Sephadex G-10 
equilibrated in gel sample buffer. During consecutive gel 
electrophoresis, most of the electrical current passes on the 
side of the slot instead of passing through the slot. This 
results in both a vertical stacking and horizontal contraction 
of the protein band. With this device the protein is efficiently 
eluied from the gel pieces and concentrated from a large 
volume into a narrow spot. The highly concentrated (about 
5 mm 2 ) protein spot is then electroblotted on PVDF- 
membranes, stained with Amido black, and in situ digested 
with trypsin. The peptides generated during digestion elute 
from the membrane into the supernatant, and can be sepa- 
rated by narrow bore reversed-phase HPLC and collected in- 
dividually for sequence analysis. 

Using this and previous procedures (37, 39, 42), we have 
so far analyzed 70 protein spots collected from 
2-dimensional gels (20, and unpublished observations) (see 
for example Fig. 2H). The sequence information amounts to 
2100 allocated residues corresponding to an average of 30 
residues per protein spot. So far we have made cDNAs of 
many of the unknown proteins that have been microse- 
quenced. and a substantial number has been cloned and se- 
quenced. All available information indicates that it may be 
possible to obtain partial sequence information from most of 
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the proteins that can be visualized by Coomassie Brillant 
Blue staining. 

Partial protein sequences are stored in the database as dis- 
played in Fig. 2H. and it should be possible in the near fu- 
ture to interface this information with forthcoming DNA se- 
quence data from the human genome project. In the long 
run. as the human genome sequences become available it 
will be possible to assign partial protein sequences to genes 
:or which the full DNA sequence and chromosomal location 
.ire known (Fig. 1). 

SUMMARY 

The studies presented in this brief review are intended to 
demonstrate the usefulness of computer-aided 2-dimensional 
gel electrophoresis and microsequencing to analvze cellular 
protein patterns, and to link protein and DNA information. 
As more information is gathered worldwide, comprehensive 
latabases will depict an integrated picture of the expression 
levels and properties of the thousands of proteins that orches- 
trate most cellular functions. 

Clearly, databases allow easy access to a large body of data 
and provide an efficient medium to communicate stan- 
dardized protein information. In the future, databases will 
foster a wide variety of biological information that can be 
used to support collaborative research projects in basic and 
applied biology as well as in clinical research (2. 5, 46). Once 
a protein is identified in a particular database all the infor- 
mation gathered on it can be made available to the scientist, 
riowever. many problems must be solved before protein 
databases become of general use to the scientific community. 
A most urgent one is to promote standardization of the gel 
running conditions so that data produced in a given labora- 
tory may be used worldwide. Surprisingly, the gel running 
technology as it stands today is still a craftmanship art. 

Finally, comprehensive, computerized databases of pro- 
teins, together with recently developed techniques to 
microsequence proteins, offer a new dimension to the study 
of genome organization and function (Fig. 1). In particular, 
human protein databases may become increasingly impor- 
tant in view of the concerted effort to map and sequence the 
entire human genome. This formidable task is expected to 
dominate biological research in the next decades. 

We would like to thank S Hirnmelstrup Jorgensen for tvping the 
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Preparation of human tumors for inaJtsis D > :«D electropnorcj 

Nonenzymatic extraction of cells from clinical tumor 
material for analysis of gene expression by two- 
dimensional polyacrylamide gel electrophoresis 

We have compared different methods of preparation of malignant cells for 
two-dimensional electrophoresis (2-DE). We found ail methods usinfi fresh 
tissue to be supenor compared to methods using frozen tissue. Our results 
indicate that nonenzymatic methods of preparation of tumor ceils, including 
tine needie aspiration, scraping and squeezing, have advantages over methods 

, U rt Si rA Cn2y i matiC r e * traclion of ceIls - Nonenzymatic methods~are rapid, appear 
to reduce loss of high molecular protein species, and alleviate the necessitv of 
separating viable and nonviable cells by Percoll gradient centrifugation. Usine 
mese techniques, high-quality 2-DE maps were derived from tumors of the 
lung and breast. In the resulting polypeptide patterns, heat shock proteins, 
non-muscle tropomyosins and intermediate filament were identified. We con- 
clude that nonenzymatic extraction of malignant cells from fresh tumor tissue 
improves the possibilities that these techniques mav be useful in clinical diag- 
nosis. 



1 Introduction 

Tumors may develop by a number of different mechan- 
isms in any given cell type. At the time of diagnosis, 
tumors will have progressed along different pathways to 
various stages of malignancy. To provide a basis for indi- 
vidual therapy it is of importance to examine specific 
properties of the tumor cell population in each patient. 
A large number of different markers have been de- 
scribed in order to increase the diagnostic accuracy. It is 
likely that a combination of serveral markers is needed 
in the future in order to reflect different properties of 
the tumor. One important method for the resolution of a 
large number of potential markers is two-dimensional 
electrophoresis (2-DE). Extensive efforts are being made 
in identifying various polypeptides separated by 2-DE 
and to characterize how the expression of these polypep- 
tides is affected by the response to cellular transforma- 
tion and various culture conditions [1.2]. It would be of 
value to transfer this information to 2-DE separations of 
polypeptides from tumor tissue samples. However, one 
prerequisite is that the quality of the 2-DE gels from 
tumor samples is comparable in quality with 2-DE gels 
from samples of cultured cells. 

Frozen tumor tissues are commonly used for various bio- 
chemical assessments. However, if such samples are ana- 
lyzed by 2-D polyacrylamide gel electrophoresis (PAGE), 
the polypeptide patterns are obscured by contamination 
of serum- and connective tissue proteins. Such nontu- 
mor-cell-related variations represent serious problems in 
the interpretation and inter-patient comparison of 2-DE 
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patterns [3 J. 2-DE patterns of ceils prepared from fresh 
tumor material were analyzed after enzymatic extraction 
of tumor cells [4. 5] or after culturing tumor fragments in 
medium containing radioactive amino acids [6]. These 
procedures may, however, lead to alterations in the gene 
expression/polypeptide patterns. We are only aware of 
one study where nonenzymatic extraction of cells from 
fresh tumor tissue (prostate cancer) was used t prepare 
samples for 2-D PAGE [4]. We have examined enzymatic 
extraction and various nonenzymatic preparation tech- 
niques, including fine needle aspiration, for the prepara- 
tion of cells from fresh tumor tissues. We describe 
nonenzymatic extraction procedures that are rapid, lead 
to high-quality 2-DE patterns, and that alleviate the 
necessity to purify tumor cell populations from dead 
cells. 



2 Materials and methods 

2.1 Cell cultures and samples used for spot 
identification 

A rat embryonal fibroblast cell line, WT2 (a kind gift 
from Dr. J. 1. Garreis and Dr. S. Pattersson) was used for 
the identification of a number of heat shock and struc- 
tural proteins. Human normal diploid lung fibroblasts, 
WI38. human epithelial breast carcinoma cells, MDA- 
231 and MCF-7 were purchased from ATCC and grown 
as recommended. Polypeptides prepared from a leu- 
kemia type pre-B-ALL were separated by 2-DE. The 
2-DE map was then analyzed bv Dr. S. M. Hanash (Uni- 
versity of Michigan. Ann Arbor, USA). 

2.2 Tumor tissues samples 

In this study, 2-DE maps from seven tumors were used 
as representative illustrations: two adenocarcinoma of 
the lung (LA, and LB, mucinous, both cases interme- 
diate grade of differentiation), one sqamous carcinoma 
of the lung (LS), one carcinoid-like breast cancer (BC), 
one microfollicular adenoma (highly differentiated) of 
the thyroid (TA), one highly differentiated hyperneph- 
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roma. a lumor of ihc kidney f KH ). and finally one case 
of poorly differentiated corpus carcinoma (CP). 

23 Preparation of cultured cells 

The cell monolayers were washed twice in phosphate 
buffered saline (PBS) and then scraped off in ice-cold 
PBS including protease inhibitors (PIH). phenylmethyl- 
sulfomi fluoride (PMSF) 0.2 mM and 0.83 mM benzami- 
dine pelleted at 660 X 3 min (+4°C) and washed one 
time before final centrifugation at 2700 X g. 5 min. The 
wet weight of the cell pellet was recorded and the cells 
were stored at -80'C until further processing. 

2.4 Preparation of tumor tissue samples 

2.4.1 General remarks 

Macroscopically representative and non-necrotic tumor 
tissues were selected within 20 min after resection. 
Parallel samples were routinely prepared for cytology. 
The samples were processed as rapidly as possible on ice 
or at +4"C and in the presence of PIH. Cells were 
stained with DifTQuick (Baxter) and usually examined at 
three different occasions during the preparation proce- 
dure: (i) cytology sample, (ii) extracted cells and (iii) 
cells after percoll gradient centrifugation. 

2.4.2 Specimen acquisition 

The strategy of sample preparation is shown in Fig. I. 
Tumor tissue cell samples were usually obtained by fine 
needle aspiration (NA) using a 0.7 mm needle. The 
syringe was filled with 1-2 mL of ice-cold culture med- 
ium/PIH. We found that if a lumor appeared to be very 
fibrous it is difficult to extract enough cells for 2-DE 
analysis. In these cases, two alternative techniques were 
examined, (i) The tumor was cut in the middle and the 
fresh surface scraped (SC) by a scalpel. The cell-rich 
material was then transferred to ice-cold culture 
medium (L15 with 5°/b fetal calf serum)/PIH. (ii) A part 
of the tumor sample was placed in culture medium on 
ice for further processing at the laboratory in the fol- 
lowing way: the material was cut into very small frag- 
ments on a pre-cooled dissection plate and transferred 
to a small glass chamber with a 0.7 mm metal net 5 mm 
above the bottom of the chamber. Medium /PIH was 
added to cover the sample (8 mL) which was gently 
squeezed (SQ) towards the net in order to release and 
wash out cells. NA and SC were also compared with an 
enzymatic extraction (EE) procedure described previ- 
ously [5]: Briefly, thin slices of tissue were incubated 
with collagenase (1 mg/mL) and eiastase (2 mg/mL) in 
medium for 1 h at 37°C. Extracted cells from even' 
sample were then subjected to percoll gradient centrifu- 
gation (Section 3.2.3). 

2.4 3 Separation of cells by Percoll gradient 
centrifugation 

The cell suspension was filtered through two nylon mesh 
filters, (i) 250 urn and (ii) 100 *im and then centrifuged 



at 660 X £ for 3 min. The cell pellet wa> re>uspencL\: 
carefully in medium, usinc j syringe and loaded onu* 
two-step discontinuous Percoll/PBS gradient. 20 J 
(density a 1.03 g/ml» and 54. 7 % <densit> - 1.0' g/mLs. 
and centrifuged at 1000 X v for 15 min. In this system, 
dead cells stay on the top. viable cells sediment to the 
interphase and erythrocytes sediment to the bottom. The 
viability of cells in the top traction and interphase %\as 
checked by the trypan blue exclusion test. The inter- 
phase cell layer (> 90°i liability) was collected and 
washed one time in a large volume PBS/PIH (centri- 
fuged at 800 X .c for 3 min». Finally, the cells were resus- 
pended in 1.4 mL PBS and pelleted at 2700 X y for 5 
min. The wet weight (WW) was recorded and the pellet 
was then stored at -80 C. 

2.4.4 Final preparation of cells for 2-D PAGE analysis 

From this point, cultured cell samples were treated 
in the same way as lumor cell samples: Each cell pellet 
was thawed on ice and resuspended in 1 .89 uL mQ water 
per mg WW <= 1.89 X WWi uL The suspension was 
frozen and thawed 4-5 X to break the cells [7|. A 
volume of (0.089 X WWi uL 10% sodium dodecyl 
sulfate (SDS). including ,v.3"m mcrcaptoeihanol. was 
mixed with the sample and incubated 5 min on ice with 
(0.329 X WW) uL of a solution of DNasc 1 (U.I44 
mg/mL 20 nm Tris-llCl with 2 nm CACh X 211,0. pH 
8.8) and RNase A (0.0718 mu/mL Tris) |8.9).Thc sample 
was frozen and lyophilized. Sample bulfcr (10] including 
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Figure J. Experimental flou chart showing main steps of the prepara- 
tion procedures. The abbreviations used for nonenzymatte extraction 
procedures are: FZ: frozen sample preparation: NA. needle aspira- 
tion; SC. scraped: and SQ. squeezed sample. Extracted celts are then 
loaded as a suspension (top volume of each lube) onto either 
1.07 g/mL Percoll Heft), or a discontinuous Percoll gradient from the 
nonenzymatic extraction (middle), or from enzymatic extraction 
(right). Cellular top- and interphase fractions are then used for 2*D£. 
For details see Section 2. 
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PMSF (0.2 msi. EDTA (1.0 msa 0.5°o Nonidet P-40 
(NP-40L and 3*[3-cholamido propyl )-dimethyJammoniol- 
1-propane sulfonaic (CHAPS: 25 mM) was added care- 
fully, mixed for 2.5 h and cemrifuged for 15 min at 



10000 rpm to remove any insoluble material. Dupiijjte 
or triplicate samples were taken for protein determina- 
tion (11). Samples were stored at -80 'C prior 10 isoelec- 
tric focusing (1EF). 
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2.4 J Preparation or frozen tumor tissue 

The technique has been described previously [3.12], 
Briefly, the sample is moaned frozen to a fine powder, 
homogenized, lyophilized and solubilized in sample 
buffer. 

2.4.6 Control of representative 

The tumors were examined routinely by experienced 
pathologists and smears or imprints from the samples 
were also assessed for cytometric DNA content by 
microspectrophotometry. 

2.5 2-D PAGE 

2-D PAGE was performed as described [8.10] except for 
the following details. The glass tubes for IEF. 1.2 X 200 
mm. contained, 2.0% Resolyte. pH 4-8 (BDH) and were 
cast to a height of 180 mm. A stock solution of acryl- 
amide (Serva) and A'.A"-meihylenebisacrylamide (16.7:1 
for IEF and 37.5:1 for the second dimension) was deio- 
nized by mixing with 5°/o w/v Duolite MB 5313 mixed- 
resin ion exchanger (BDH) for 30 min. filtered (with a 
0.22 \im nitrocellulose filter) and stored at — 70°C. 
A".A"-Methylenebisacrylamide. A'.A".A\N'-tetramethyieth- 
ylenediamine (TEMED) and ammonium persulfate were 
purchased from Bio-Rad. IEF tubes were prefocused at 
200 V in 60 min. To each tube a sample corresponding to 
20—40 ug protein was applied and focused for 14.5 h at 
800 V and finally 1.0 h at 1000 V using a Protean II cell 
(Bio-Rad) and Model 1000/500 Power Supply (Bio-Rad). 
The tube gels were finally extruded into 1.25 mL equili- 
bration buffer, containing 60 mM Tris. pH 6.8 (2% SDS. 
100 mM dithiothreitol and 10% glycerol), frozen on dry 
ice and stored at -70°C. The second dimension (1.0 X 
180 X 90 mm) of the acrylamide concentration was 10% 



T. and the gel contained 376 nut Tris. pH 8.S, and v.\ 
SDS. IEF gels were applied on iop of the slab cel. seaieJ 
with 0.5% agarose containing electrophoresis running 
buffer (60 mM Tris-base. 0.2 m glycine and 0.K SDS* 
and electrophoresed with 10-11 mA per gel (consuni 
current) at +10T. Six gels were run together in a Pro- 
tean II xi 2-D Multi-Cell < Bio-Rad I. Proteins w ere visual- 
ized by silver staining and photographed with the acidic 
side to the left [13.14). 

2.6 Identification of polypeptides 

Vimentin and vimentin-derived polypeptides were identi- 
fied by extraction of an MDA-231 cell lysaie with 0.t> m 
KCl/0.5% NP-40 (15). Tropomyosins were exctracted 
from MDA-231 and WI38 cell lysates (16]. and cviokera* 
tins were extracted from MDA-231 and MCF-" cell 
lysates [17]. The patterns were compared with published 
maps [19-21). Proliferating cell nuclear antigen (PCNA) 
was identified by immunoblouing (PC10 m.\B. Dako- 
patt) using a semidry system (MuHiphor 11 Nova Blot. 
Pharmacia- LKB Biotechnology aB) and enhanced che- 
moluminescence (ECL) detection (Amcrsham). 

3 Results 

3.1 2— DE of samples prepared from normal and 
tumorigenic cultured ceils 

The object of this study was to develop methods for pre- 
paration of 2-DE maps from human tumor tissue which 
have the same high resolution as those obtained from 
cultured cells. Shown in Fig. 2 are high resolution 2-DE 
gels prepared from cultured cells and one leukemia: 
SV40 transformed embryonal rat fibroblasts WT2 (Fig. 
2a); human MDA-231 breast carcinoma cells (Fig. 2b): 
human W138 fibroblasts (Fig. 2c) and human pre B-ALL 
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cells (Fig. 2d). Polypeptides were identified through a 
laboratory exchange of cell samples/2-DE maps and 
through 2-DE analysis of purified proteins (Table 1). 

3.2 Preparation of samples from solid tumors 
3.2.1 Fresh versus frozen tissue 

An adenocarcinoma of the lung (LA) was prepared for 
2-DE by conventional methods using frozen material 
(Fig. 3a). There are several possibilities for the poor reso- 
lution using frozen tissue, including the presence of high 
molecular weight protein aggregates. Filtering extracts 
through 0.1 Mm filters (Durapore. Millipore) resulted in 
a slightly improved resolution (not shown). When fresh 
tumor tissue from tumor LA was used for sample prepa- 
ration, using fine needle aspiration to collect the cells, 
the resolution was considerably improved (Fig. 3b). The 
use of fresh lissue resulted in a general increase in reso- 
lution, which was most pronounced in the 50—100 kDa 
molecular mass range. A number of differences in the 
protein profiles of the gels in Figs. 3a and 3b can be ob- 
served, some of which are indicated in the figures. The 
decrease in serum albumin in Fig. 3b is likely to result 
from loss of serum proteins occurring when cells were 
pelleted after aspiration. Other differences, such as the 
decreased level of transformation-sensitive tropomyosins 
(TM1-TM3). may result from enrichment of tumor cells 
in the sample of Fig. 3b. Fine needle aspiration, a well- 
established technique in cytology, extracts mainly tumor 
cells because of decreased intercellular adhesiveness of 
neoplastic cells as compared to normal lissue. Micros- 
copic examination of DifT-Quick-stained extracted cells 
from case LA revealed almost 100% tumor cells, 
whereas the whole tissue extract contained approximate- 
ly 60°o tumor cells. 
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Figure 4. 2-DE analysis of j case of breast carcinoma <BC). Compan 
heads indicate increased inicnsm .»nd circles or bracket indicate decre 
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Table 


1. Names and abbreviations for identified 


spots 


Spot 


Name Basis for iaer::: 


A 


Acuns 


a 


aA 


a/l/ra'Actinin 


a 


B23 


Protein B23 /Numatnn 


a 


EF2 


Elongation factor 2 


j 


EF1 


Elongation factor 1 $ 


a 


GT 


Glutathione-S-transpherase {pi 


a 


hsp60 


Heat shock protein 60 


a 


h$p73 


Heat shock protein 73 


a 


hsp80 


Heat shock protein 80. GRP78. BIP 


a 


hsp90 


Heat shock protein 90 


a 


hspIOO Heat shock protein 100. Endoplasmin 


a 


I Co 

ira 


Intermediary filament associated 


a 


KB 


Cyiokeratm 8 


b and a 


I amR 


Lamm B 


a 


T ml 
Lip 1 


j.ipocomn i 


j 


I iiO 


upocomn u 


a 


T in< 
Lipj 


Lipocortin V 


a 


Mil I 
Mill 


Mitcon i/p — r i At rase 


a 


Mit2 


Mitcon 2 


a 


Mit3 


Mitcon 3 


a 


MRP 


Mucine Related Polypeptides 




pena 


Pioiiferating cell nuclear antigen 


c and a 


PLC 


Phosphoiipase C (1) 


a 


RO 


RO/SS-A antigen 


a 


SA 


Serum Albumin 


b and a 


aT 


c/p/ifl-Tubulin 


a 


bT 


fcer/to-Tubulin 


a 


tml 


Non-muscle tropomyosin isoform 1 


b and a 


tm2 


Non-muscle tropomyosin isoferm 2 


b and a 


tm3 


Non-muscle tropomyosin isoferm 3 


b and a 


im4 


Non-muscle tropomyosin isoform 4 


b and a 


im5 


Non-muscle tropomyosin isoform 5 


b and a 


TPI 


Triose phosphate isomerase 


a 


V 


Vimentin 


b and a 


VidI 


Vimentin derived protein 


b and a 


Vid2 


Vimentin derived protein 


b and a 


Vid3 


Vimentin derived protein 


b and a 


Vid4 


Vimentin derived protein 


b and a 


Vin 


Vinculin 


a 



a. homologous position with respect to other mammalian systems 

b. purified proteints) 

c. immunobloiting 




of 2-DE quality and some differences in detected spots farrow 
i intensity of the same spots) between tAl enzymaitcaHy and fB) 
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122 Comparison of difTerent methods for preparing 
cells from fresh tumor tissue 

Samples were prepared from breast and lung carcinomas 
using either an enzymatic treatment with collagenase/ 
elasiase or using nonenzymatic preparations (Fig. 4). A 
number of differences in the protein profiles were ob- 
served in the resulting 2-DE gels, some of which are 
indicated in Figs. 4a and b. These differences include 
both increases and decreases in spot intensity. These dif- 
ferences may result from degradation of high molecular 
weight polypeptides during enzymatic treatment, in- 
creased solubilization of polypeptides, or may have other 
causes. For many tumors, it was only possible to obtain 



small amounts of material since ihe> ucr*- reserve J To- 
other examinations. In these cases, samples could be pre- 
pared for 2-DE using either needle aspiration or 
.scraping. Figure 5a shows a 2-DE gel pared from 
squamous lung carcinoma (LSI cells collected b> needle 
aspiration and Fig. 5b shows a eel prepared from the 
same tumor by scraping, in this case, a number of differ, 
ences were recorded between the two procedures, some 
of which are arrowed in Fig. 5. Samples obtained from 
other tumors (breast and lung! generalK showed fewer 
differences between these two methods of cell sampling 
(not shown). These data show that different nonenzy- 
matic extraction procedures may yield difTerent polypep- 
tide patterns. However, the number of spots with ;t large 
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Figure 6. 2-DE analysis of three other types of tumors. (Al hypernephroma. (Bi an adenoma of the thyroid and iCi corpus cancer, using the 
nonenzymatic preparation technique. Arrowheads and circles indicate some cytosolic polypeptides. 
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difference in intensity were lower than when a nonenzy- 
matic preparation was compared with an enzymatic pre- 
paration. 

2-DE maps of satisfactory quality were prepared by a 
third procedure. Cells were released from small pieces of 
tumor by squeezing (see Section 2). Some examples of 
this are shown in Fig. 6 where 2-DE maps derived from 
a case of hypernephroma. KH (Fig. 6a), a case of thyroid 
tumor. TA (Fig. 6b) and a case of corpus cancer. CP (Fig. 
6c) can be seen. We conclude that nonenzymatic tech- 
niques are useful for 2-DE analysis of a number of dif- 
ferent tumors. The quality of the resulting gels is com- 
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parable to that obtained using cultured cells < compare 
the gels in Fig. 2 with those in Fig. 4. 6 and Which of 
these methods will be optimal will, in our experience, 
depend on the tumor material. For example, very small 
tumors are preferably extracted by squeezing: on the 
other hand, breast cancers (which are often fibrous) 
yield satisfactory samples using scraping. 

3.23 Purification of ceils on percoll gradients 

We considered the possible advantage of separating 
viable cells from dead cells, erythrocytes, and debris 
using discontinuous Percoll gradients. Cells collected 




Ftvure ' 2-DE analysis of polypeptides from viable <b and d) and nonviable (a and c> cells of an adenocarcinoma of the lung (LB), 
separated using discontinuous Percoll density gradient. Nonenzymatic preparation technique (a and b» and enzymatic preparation 
technique ic and d) arc compared. 
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from the interphase showed a viability of more than 
90% as judged by trypan blue exclusion test. However, it 
as found that the yield of viable cells decreased drama- 
tically if the tissue resection was not immediately pro- 
cessed. To study the effect of lysis of cells during the pre- 
paration procedure. 2-DE maps were prepared from 
nonenzymatically extracted cells of case LB collected 
from the top fraction (nonviable. Fig. 7a) and interphase 
fraction (viable. Fig. 7b). These 2-DE maps were 
compared with corresponding fractions (nonviable. Fig. 
7c, and viable. Fig. 7d) of enzymaiically extracted cells. 
One clear disadvantage of the enzymatic technique was 
that when loss of cell viability occurred during prepara- 
tion, a dramatic loss of high molecular weight polypep- 
tides was observed (Fig. 7c). This was probably due to 
degradation of intracellular proteins. However, nonenzy- 
maiic preparations showed fewer differences between 
viable and nonviable cells: The most pronounced altera- 
tion was a decrease of a group of mucine related pro- 
teins (Fig. 7b). We conclude, therefore, thai disconti- 
nuous Percoll gradient is necessary after enzymatic 
extraction of cells, but can be omitted from the nonenzy- 
matical tumor sample preparation procedure. 

We used the MDA-231 cell line to study the effects of 
cell lysis and leakage of cytosolic polypeptides during 
sample preparation. Remarkably, after 30. 50. 80 and 140 
min of incubation in PBS/PI H at 0"C, no significant 
chanties were observed in the 2-DE pattern (not shown). 
Although loss of cell viability may not result in protein 
degradation when cells are incubated in the presence of 
protease inhibitors, loss of cytosolic proteins would be 
expected during pelleting of cells. We monitored the loss 
of lactate dehydrogenase (LDH) activity into the super- 
natant during incubation in PBS of MDA-231 and MCF- 
7 breast cancer cells at 20* C. In both cases, loss of via- 
bility was paralleled by release of LDH from the cells 
(Fig. 8). After 5 h. 70"n of the MCF-7 cells, but only 30Vi. 
of the MDA-231 cells were dead (not shown). 




min 

Figure S. The relative release il'ucuon in supernatant of total) of lac- 
tate dehydrogenase acttivtty <LDHl and cella viability versus incuba- 
tion time of the mammary carcinoma cell lines MDA-231 and MCF-7 
during mcubanon in PBS at 20"C. 



These data indicate the impact of a rapid preparation 
procedure, at low temperature, of fresh tumor samples 
Experiments have also been performed using onh 
1.07 g/mL Percoll (Fig. 6c and Fig. 1. left test tube) tn 
order to remove erythrocytes. One dear advantage with 
this procedure, which today is routinely utilized, is a 
higher yield of viable ceils, probably due to decreased 
sample preparation time. 



4 Discussion 

We describe procedures for sample preparation from 
solid tumors for 2-DE. 2-DE maps could be derived 
from solid tumors which were similar in quality to those 
obtained from cultured cells. Compared to methods 
using frozen material, the resolving power of the 2-DE 
technique is increased, allowing examination of a large 
number of polypeptides from tumors of different malig- 
nancies. Other investigators [12.221 have used samples 
from frozen tumors to derive 2-DE maps. We have previ- 
ously described disadvantages encountered using frozen 
tumor samples including variations in contaminating pro- 
teins between different samples [3J. The methods de- 
scribed here are based on the preparation of cells from 
tumors without enzymatic digestion. The enzymatic step 
could be avoided since malignant cells usually grow as 
solid masses which are not strongly attached to the 
matrix. Furthermore, we found that omitting the enzy- 
matic digestion alleviated the necessity of purifying 
viable tumor cells on Percoll gradients. This was in sharp 
contrast to enzymaiically treated samples, where loss of 
viability leads to loss of high molecular weight proteins 
(Fig. 7c). 

At least in the case of lung cancer, viable and nonviable 
cells showed small differences in respect to 2-DE maps. 
Presumably, protease inhibitors penetrate cells and 
inhibit proteolysis. In model experiments, we observed 
leakage of cytosolic protein (LDH) from the cells in 
parallel to loss of viability. Apparently, however, only a 
limited decrease of the level of low molecular weight 
cytosolic polypeptides was detected using silver staining 
combined with visual inspection. We have found that 
although some tumors arc well suited for the prepara- 
tion procedure described, others are not. In general, 
good results were obtained using tumors of the lung, 
breast, corpus and lymphomas. In contrast, cells from 
thyroid adenomas and hypernephroma showed poor via- 
bility. We were in these cases unable to separate nonvi- 
able cells from viable cells, and we can therefore not 
evaluate the consequence of the loss of viability on 
2-DE patterns, apart from a loss of some low molecular 
weight cytosolic polypeptides. 

Highly differentiated tumors may show lower viability as 
compared with poorly differentiated tumors (Dr. Farkas 
Vanky, personal communication). A number of samples 
from thyroid tumors were prepared for 2-DE but most 
cases showed poor viability. We believe that special care 
is needed during preparation of generally highly differen- 
tiated tumor groups. The difference between loss of via- 
bility/leakage of LDH of the more differentiated MCF-7 
cells and the less differentiated MDA-231 cells is in line 
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with these observations (Fig. 8). A number of potential 
and interesting markers, like tropomyosin isoforms. cyto- 
keratins and heat shock proteins, appear to be insensi- 
tive to loss of viability during the preparation procedure. 
We have to date made numerous observations of altera- 
tions in the expression of these polypeptides in breast 
cancers and lung cancers. 

Another problem that may occur, irrespective of sample 
preparation techniques used, is admixture of lympho- 
cytes. These cases are easily detectable in smears and it 
may therefore be possible to select lymphocyte specific 
spots as "internal markers'* for the 2-D PAGE analysis. 
Studies using this approach are in progress. Many of the 
polypeptides identified are structural (Table 1). Since the 
expression of many of these polypeptides are known to 
vary between normal and malignant cells, the possibility 
to determine their expression simultaneously is 
appealing. In the specific case of breast cancer, altera- 
tions in the expression of intermediate filament proteins 
(cytokeratins) are known to occur during tumor progres- 
sion [23]. Other proteins known to be differentially 
expressed between normal cells and transformed cells 
arc tropomyosins. numatrin/B23. heat shock proteins 
and PCNA. To this end. we have observed alterations in 
the expression of cytokeralin 8. hsp 90. and non-muscle 
tropomyosin isoform 2 during malignant progression. 
(Okuzawa ei al.. in preparation and Franzcn ei al. % in pre- 
paration). 

The method of choice for sample preparation from 
tumor tissues will depend on the properties of the tumor 
material studied. It may be important to use only one 
method when comparing cases within one group, as dif- 
ferences were observed between methods. The advan- 
tages of the nonenzymatic techniques are (i) that it mini- 
mizes contamination with connective tissue, (ii) that 
problems with contamination of scrum proteins are 
avoided, and tiii) that separation of viable and dead cells 
is not necessary. Hereby the revolving power of 2-D 
PAGE is maximized for the analysis of human tumors 
and studies on inter-tumor variations in gene expression 
are lacilitated. In addition, the polypeptide patterns ob- 
tained may be more representative for the in vivo tumor 
cell since the use of enzymes and incubations have been 
minimized. 

Hi- wmtltt like to thank Dr. J. I. Garrets. Dr. S. Pattcrsson. 
Dr. S. M. Hanash and Dr. J. f. Celts tor making sample 
and 2-DE map exchanges possible. This study was sup- 
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Reference points for comparisons of two-dimensional 
maps of proteins from different human cell types 
defined in a pH scale where isoelectric points correlate 
with polypeptide compositions 

A highly reproducible, commercial and nonlinear, wide-range immobilized pH 
f:-!f ,ent (IPG) was used 10 generate two-dimensional (2-D) gel maps of 
rSjmethionine-labeled proteins from noncultured. unfractionated normal 
human epidermal keratinocytes. Forty one proteins, common to most human 
cell types and recorded in the human keratinocyte 2-D eel protein database 
were identified in the 2-D gel maps and their isoelectric points (p/) were deter- 
mined using narrow-range IPGs. The latter established a pH scale that 
allowed comparisons between 2-D gel maps generated either with other IPGs 
in the first dimension or with different human protein samples. Of the 41 pro- 
teins identified, a subset of 18 was defined as suitable to evaluate the correla- 
tion between calculated and experimental p/ values for polvpeptides with 
known composition. The variance calculated for the discrepancies between cal- 
culated and experimental p/ values for these proteins was 0.001 pH units. 
Comparison of the values by the Mest for dependent samples (paired test) 
gave a p-level of 0.49. indicating that there is no significant difference between 
the calculated and experimental p/ values. The precision of the calculated 
values depended on the buffer capacity of the proteins, and on averaue. it 
improved with increased buffer capacity. As shown here, the widely available 
information on protein sequences cannot, a priori, be assumed to be sufficient 
for calculating p/ values because post-translational modifications, in particular 
A-termmal blockage, pose a major problem. Of the 36 proteins analvzed in 
this study. 18-20 were found to be .V-terminally blocked and of these onlv 6 
were indicated as such in databases. The probabilitv of ;V-terminal blockage 
depended on the nature of the .V-terminal group. Twenty six of the proteins 
had either M. S or A as A -terminal amino acids and of these 17-19 were 
blocked. Only 1 in 10 proteins containing other .V-terminal tzroups were 
blocked. 



1 Introduction 

As compared with carrier ampholyte isoelectric focusing 
(CA-IEF), the application of immobilized pH gradients 
(IPGs) in the first dimension in 2-D gel electrophoresis 
offers improved reproducibility [1] because the nature of 
the pH gradient makes the resulting focusing positions 
insensitive to the focusing time [2) and to the type of 
sample applied [3]. The recently introduced ready-made 
IPG strips [4] seem to be an ideal substitute for the car- 
rier ampholyte gradients, which until now have been the 
most commonly used first dimensions in 2-D gel electro- 
phoresis. The availability of standardized first dimen- 
sions opens the possibility of comparing 2-D gel maps of 
various cell types generated in different laboratories, pro- 
vided that the focusing positions of a number of easily 
recognizable polypeptide spots common to the cell types 
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in question are known. Even though this approach is 
limited to experiments performed with the same standar- 
dized IPG. the flexibility provided by IPGs allows the 
pH gradient to be adjusted to the requirements of a par- 
ticular experiment. 

Exchange and communication of 2-D gel protein data re- 
quires a pH scale that is independent of the particular 
IPG used and by which the results can be described. The 
introduction of carbamylaiion trains and the relation of 
focusing positions to the spots in these trains repre- 
sented a step forward towards solving the reproducibility 
problem experienced with carrier ampholyte focusing [5J. 
Problems associated with the use of carbamylaiion trains 
were mainly due to lack of temperature control and to 
the use of nonequiiibrium focusing conditions. Accord- 
ingly, the pattern variation involved not only the re- 
sulting pH gradients, but also the relative spot positions 
as related to each other and to spots in the carbamyla- 
iion trains. Even though the question of reproducibility 
has. to a large extent, been solved, the carbamylation 
trains are still not ideal as markers because the spots in 
the trains do not represent defined entities bui raiher a 
large number of differently carbamylaied peptides 
having close p/ values. As a result, the spots are large 
and poorly defined as compared to the ordinary polypep- 
tide spots in 2-D gel maps. 
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Neidhardt etai [6] defined the pH gradient in 2-D gel 
experiments by p/ markers whose pi values were calcu- 
lated from the amino acid composition. Focusing posi- 
tions of other polypeptides could be predicted from their 
composition but the pA' values needed for the p/ calcula- 
tions were unknown. Various groups employing this 
approach do not use the same pK values [6. 7] and there- 
fore, the p/ values derived in this way cannot be 
expected to describe the variation of the hydrogen ion 
activity. In spite of this fact, it is still possible to make 
approximate predictions of focusing positions because 
the pK values used to define the pH gradient are also 
used to calculate pi values and to predict the focusing 
positions. Errors in pK assignments are therefore com- 
pensated. A pH scale which corretly reflects the variation 
in hydrogen ion activity during focusing should improve 
the precision of the predictions, but this has never been 
implemented with CA-IEF focusing as a first dimension 
in 2-D gel electrophoresis. The main reason for this are 
the problems associated with pH measurements in 
focused gels containing high concentrations of urea. 

IPGs can be described from the concentration variation 
of the immobilized groups, provided that the pA' values 
of these groups are known for the conditions prevailing 
during focusing. To avoid measurements on gels, Gia- 
nazza etai [8] suggested the use of pK values derived by 
addition of determined pA shifts. Recently, direct deter- 
minations of pA differences between immobilized 
groups in IPGs were made by determining p/-pA values 
in overlapping narrow- range IPGs [9, 10] and the results 
verified the applicability of the Gianazza approach. A 
description of the focusing results in a pH scale, which 
correctly describes the variation of the hydrogen ion 
activity for the focusing conditions used, not only allows 
the comparison of 2-D gel maps generated with different 
IPGs, but also opens the possibility for correlating the 
focusing position of a polypeptide with its composition 
(9]. Experiments by Bjellqvist etai [9, 10] have implied 
that pH scales showing good correlation between calcu- 
lated and experimental p/ values can be derived for any 
of the conditions commonly used for focusing in connec- 
tion with 2-D gel electrophoresis. These pH scales are 
then defined through the pA values of the immobilized 
groups in the IPG containing gel. To be useful for inter- 
laboratory comparisons, however, the pH scale has to be 
defined through pi values of easily recognizable spots 
present in the 2-D gel map. So far, pi determinations in 
a useful pH scale, combined with determinations of pK 
values needed for pi calculations, have only been made 
for the pH range 4.5-6.5 at 10 °C [9]. CA-IEF focusing as 
described by O'Farrell [11] does not control the tempera- 
ture of the first dimension, which can be expected to be 
slightly above room temperature. With IPGs, the temper- 
ature commonly used is about 20°C [4, 12] or 25 °C [13] 
and this is a critical parameter that needs to be con- 
trolled [14]. 

The present work was designed to compare 2-D gel maps 
of different cell types in a laboratory applying both 
CA-IEF and IPG focusing at a common temperature. To 
this end we have generated 2-D gel maps of proteins 
from noncultured, unfractionated normal human epi- 
dermal keratinocytes with IPG in the first dimension 



and a focusing temperature of 25 C. We have used com- 
mercial nonlinear, wide-range IPG strips which give 2-D 
gel maps that are closely similar to the ones resulting 
with the CA-IEF technique used to establish the human 
keratinocyte database [15]. As an initial step towards 
interlaboratory comparisons of results obtained with the 
nonlinear gradient as a first dimension we report here 
on the focusing positions of 41 known proteins that are 
common to most human cell types. The pH range 
covered corresponds to the range in classical CA-IEF 
2-D gel electrophoresis and in order to use these pro- 
teins as internal standards for comparing 2-D gel maps 
generated with other IPGs we determined their pi values 
with narrow-range IPGs in the first dimension. We have 
compared the calculated versus experimental pi values 
and show that it is necessary to have further information 
(absence or presence and nature of posttranslational 
modifications), in addition to ammo acid composition to 
be able to calculate p/ values that correspond to the 
actual experimental values. The pA values used for the 
calculations are provided and the usefulness of pi predic- 
tion in relation to database information is discussed. 
Furthermore, we comment on the possibility of using 
experimentally determined pi values to verify the avail- 
able database information on polypeptide composition. 



2 Materials and methods 

2.1 Apparatus and chemicals 

Equipment for isoelectric focusing and horizontal SDS 
electrophoresis (Muliiphor' II electrophoresis chamber, 
lmmobiline v strip tray. Multidnve XL programmable 
power supply. Macrodrive power supply and Multitemp* 
II) was from Pharmacia LKB Biotechnology AB 
(Uppsala. Sweden). Vertical second-dimensional gels 
were run in the home-made equipment described in [15). 
The IPG strips with the wide-range nonlinear pH gra- 
dient were either lmmobiline DryStrip" pH 3—10 NL, 
180 mm or alternatively 160 mm long IPG strips with a 
corresponding pH gradient. In both cases the IPG strips 
were delivered by Pharmacia LKB. lmmobiline, Pharma- 
lyte. Amphohne. GelBond as well as PAG film and the 
ready-made horizontal SDS gels (ExcelGeP XL SDS 
12-14) were also from Pharmacia LKB. Purified proteins 
and peptides were from Sigma (St. Louis. MO). 

2.2 Sample preparation 

Preparation and labeling of unfractionated keratinocytes 
as well as fibroblasts have been described in [16). Cells 
were lysed in a solution containing 9.8 m urea, 2% w/v 
NP-40i 100 mM DTT and 2°/o v/v Ampholine pH 7-9. 

2.3 2-D gel electrophoresis 

First-dimensional focusing was performed according to 
Gorg eial. [2] with some minor modifications, as de- 
scribed in [9]. Rehydration of the IPG strips was made 
in a solution containing 9.8 m urea. 2% w/v CHAPS, 10 
mM DTT and 2°/o v/v carrier ampholyte mixture. The car- 
rier ampholyte mixture consisted of 2 parts Pharmalyte 
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4-6.5. 1 pan Ampholine pH 6-8 and 1 pan Pharmalyte 
pH 8-10.5. Usually, cathodic sample application was 
used and the samples were diluted 2-20 times in a solu- 
tion containing 9.8 m urea. 4<>o w/v CHAPS. l°o w/v 
DTT and 35 m.vt Tris base. For acidic application, the 
Tris-base was substituted with 100 msi acetic acid. The 
degree of dilution and sample volume (20-100 uL) 
depended on the particular sample and the IPG. and 
whether visualization of the proteins was to be done by 
Coomassie Brilliant Blue or silver staining. With the 
wide-range non-linear IPG. 10-30 ug of total protein 
was loaded for silver staining and 100-200 ug for Coo- 
massie staining. Focusing was done overnight with Vh 
products in the range of 45-60 kVh with 160 mm long 
strips and 50-70 kVh with 180 mm long strips. Solubili- 
zation of polypeptides and blocking of-SH groups prior 
to the second-dimensional run. as well as loading on the 
second-dimensional gel was done as described in [9J. 
The stacking gel was omitted and 5-10 mm were left at 
the top of the second-dimensional gel for applying the 
IPG strip. The space was filled with electrode buffer con- 
taining 0.5% w/v agarose. Casting, running, staining and 
autoradiography were carried out as described in~[15]. 

2.4 Experimental determination of p/ values 

The determination of the pA* differences between Immo- 
bilines pA 4.6. pA' 6.2 and pA" 7.0 necessary for the cali- 
bration of the pH scale at 25 C in 9.8 m urea was done 
as described in [9] with the same narrow-range IPGs. 
The pH scale was defined by setting the pA' value of 
Immobiiinc pA 4.6 equal to 4.61 [9] and the determined 
pA differences gave the pA values of Immobiiines pA 6.2 
and pA 7.0. equal to 5.73 and 6.54. respectively. The pA' 
differences found arc in good agreement with values de- 
rived from [17] and [8] by extrapolation to 9.8 m urea 
concentration. As in [9]. additional narrow-range recipes 
ha\c been used lor determining p/ values. Wiih narrow- 
range IPGs extending to pH values higher than the pA' 
value oi Immobiline pA 7.0. anodic sample application 
\sas u>cd with acetic acid added to the sample solution. 
Otherwise, cathodic sample appiicaiion was used with 
the same sample buffer as for wide-range IPGs. 



2.5 Protein compositions used for p/ calculations 

With the exception of vimcntin. protein compositions 
are irom the Swiss-Prot database (18). For vimentin. we 
used the data from [191. where the ammo acid at posi- 
tion 41 is a D instead of a S. Information in the Swiss- 
Proi database on phosphorylation has been disregarded 
because it was known from earlier studies (J. E. Celis. 
unpublished results) that the spots in question corre- 
sponded to the unphosphorylated forms of the peptides. 
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different substituenis on the c-carbon were taken into 
account. The calculations of p/ values were made with 
the aid of the IPG-maker program [20]. 

2.7 pK values used for p/ calculations 

For the carboxyl terminal group and internal glutamyl 
and aspanyl residues the same pA' values were used as in 
[9]. For C-terminal glutamyl and aspanyl residues, sep- 
arate pA' values were derived with the aid of the Taft 
equations [9, 21). The pA' values of histidvl groups were 
calculated from the pi values of human carbonic anhv- 
drase I as in [9J. For A-terminal glycine a pA* value of 
7.50 was used. The pA' shift caused by a substituent on 
the c-carbon was assumed to be identical with the pA 
shift the substituent caused for the amino group in the 
amino acid. i.e. 2.28 pH units were subtracted from the 
pA' values for the amino groups in the amino acids given 
in [22. 23). The approximate pA* value of 9 for the cys- 
tenyl group was taken from [24]. For tyrosvl and arcinvl 
groups we used the pA' values for the ammo acids' \22 
23]. For lysyl groups the effect of high urea concentra- 
tion on amino groups was taken into account and 0.5 pH 
units were subtracted from the amino acid pA' value. 
These last three pA' values are far from the pH range 
under study and the results found would have been the 
same if lysyl and arginyl groups were assumed to be 
fully ionized while the ionization of tyrosy! groups were 
neglected. A complete list of the pA values used is uiven 
in Table I. 



Table 1. pA* Values used for the iomzable groups in peptides 
9.8 m urea. 25 U C 
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2.6 Calculation of p/ values 

For the p/ calculations it was assumed that the same pA' 
value could be used for an amino acid residue in all 
polypeptides and in all positions in the peptide except 
lor A- or C-terminally placed amino acids. For the pA' 
values of the A ; -terminal amino groups the effect of the 



2.8 Statistical analysis 

Statistical comparisons of the experimental and calcu- 
lated p/ values were done on an Apple Macintosh Ilsi 
using the statistical package Statistica/Mac. release 3.0b 
(from StatSoft Inc., Tulsa. Oklahoma). Calculated and 
experimental p/ values were compared by the /-test for 
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correlaied samples (paired Mest). The normality of p/ 
differences was estimated graphically by probability 
plots. The variances of the data presented here and the 
similar data on plasma and liver proteins in [9] were 
compared by the F-test. 

3 Results and discussion 

3.1 Identification of polypeptides and pi determinations 

The 2-D gel maps of [ 3i S]methionine-Iabeled proteins 
from noncultured. unfractionated normal human kerati- 
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nocytes. focused with the nonlinear, wide-range IPG ar.J 
CA-IEF pH gradients in the first dimension."^? snoun 
in Figs. 1 and 2. respectively. The IPG extends to higr.c: 
pH values but otherwise the two patterns are \er\ sim - 
ilar and most of the spots in the IPG pattern can rc 
directly related to the corresponding spots m the 
CA-IEF gel. To obtain comparable patterns it was impor- 
tant to keep the focusing temperature as similar a> 
possible. Compared to other studies [1-4. 9. 10. I2-UJ. 
we increased the urea concentration in the focusing gel 
to 9.8 m because keratins streaked badly in the focusing 
dimension when 8 m urea was used, presumably due to 
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Figure I. ;-D eel protein map of [ 35 S|methionine-labeled proiems from noncultured. unfractionated normal human keraiinocvtes focused with 
the nonlinear, wide-range IPG in the first dimension. The position of the 41 proteins analyzed in this study is indicated. 
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aggregates of acidic and basic keratins. An increase in 
urea concentration to 9 m or more eliminated these 
streaks: apart from this effect, no other major changes in 
the focusing positions were observed. In Fig. 1 we have 
indicated the positions of 41 known proteins from the 
human keratinocyte 2-D gel database thai are most 
likely common to most human cell types. The choice 
was made because these proteins are easy to identify 
with certainty. With the exception of stratifin (spot 2). 
involucrin (spot 4) and keratin 14 (spot 15). which are all 
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epithelial markers, these proteins are also present m 
human fibroblasts (Fig. 3) and lymphocytes (results no: 
shown), and therefore can be used as landmarks for com- 
paring 2-D gel maps derived from different cell types. In 
Table 2 the 41 proteins are listed together with their 
sample spot numbers (SSP) in the human keratinocyte 
protein database and p/ values determined in 2-D gel 
maps generated with narrow-range IPGs in the first 
dimension. 
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Fiturc 2. 2-D gel protein map ol" ( J5 S|methionine-labeled proteins from noncultured. uni'ractionated normal human keraunocyies focused with 
C.V1EF in the first dimension. The position of the J I proteins analyzed in this study is indicated. 
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3.2 Comparison between the determined and calculated 
p/ values for human keratinocyte proteins 

Thirty six of the 41 proteins listed in Table 2 are found 
in ihe Swiss-Proi daiabase. Comrary to the plasma and 
liver proteins used in [9]. the p/ calcuations on the pro- 
teins used in ihis study posed some problems that 
reflected the way in which they were characterized. The 



proteins used by Bjellqvist er at. [9] were either very 
abundant and well-characterized plasm3 proteins or they 
were identified by A-ierminal sequencing and. therefore, 
the nature of the A-terminals (acetylated or non-acety- 
laied) was in both cases known. The proteins used in 
this study have all been characterized by internal 
sequencing [7] and it is known that .V-ierminal acetyte- 
tion occurs with high frequency in eukaryotes. 
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Fixture 2-D protein map of {"Slmcthiomne-labeled proteins from normal human fibroblasts focused wuh ihc nonlinear. wide-range IPG in the 
first dimension. The position of the ■*! proteins analyzed in this study is indicated. 
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According to Brown and Robert (25). proteins with acety- 
lated A-terminals correspond in weight to approximately 
80% of the soluble protein in ascites cells. Based on 
results from A-terminal sequencing, at least 40°/o of the 
spots in the human liver protein 2-D gel map appear to 
be blocked [3]. The corresponding number, derived from 
107 spots in the 2-D gel map of human T-lymphocyte 
proteins, falls between 60 and 65% (J. Strahler. personal 
communication). Information concerning A-terminal 
blockage is not normally available, and in the Swiss-Prot 
database only 6 of the 36 keratinocyte proteins are speci- 
fied as A-terminally blocked. We have, within the present 
material, defined 18 proteins for which the A'-terminais 
are very likely to be correctly described. Six of these pro- 
teins are listed in the Swiss-Prot database as A-termi- 
naliy blocked, four represent proteins which appear in 
the human liver 2-D gel map and have been A'-termi- 
nally sequenced as liver proteins [3] and the remaining 
eight have A ; -terminal groups other than M. S and A. i.e. 
V-terminals for which A-acetylation is uncommon [26]. 
In Figs. 4A. B. C and D p/ values calculated from Swiss 
Prot database information are plotted against the experi- 



mentally determined p/ values for all the keraunojy.j 
proteins listed in Table 2 and for the IS selected Pro- 
teins, as well as for the plasma and liver proteins t 
from (9] valid for 10 0 O*. 

The calculations show that without knowledge of the 
status of the A-terminal group, precise predictions of p/ 
values for eukaryotic proteins cannot be achieved based 
on the information available in Swiss-Prot and similar 
databases. However, for proteins where the A-terminal 
status is known, we find good correlation between pre- 
dicted and experimental p7 values. When the variance of 
the p/ discrepancies and the variance of calculated 
charges at the experimental p/ values derived from the 
present data set are compared with the corresponding 



There arc four piois: (A) the 3t> polypeptides from normal human 
keratinocytes (no corrections). (B) the 3© pol\ peptides from Fie. 4\ 
where p/ values have been recalculated lor 12 polypeptides uiih M. 
S and A as V-terminally assumed blocked, based on calculated 
charge. (O the 18 selected polypeptides vuth information on the 
V-iermmal configuration, and (D) plasma and liver proteins 



Figure 4. Calculated vs. experimental p/ values. Lines are fitted using the least squares' criterion. <A) 36 polypeptides from normal human kcrati- 
nocytes (no corrections). (B) 36 polypeptides from Fig. 4A (including the IS marker polypeptides! where p/ values have been recalculated 
assuming v-terminal blockage; x indicates recalculated p/ values: nucleolar protein B23 is indicated with an arrow <C> 18 polypeptides wuh infor- 
mation on V-terminal configuration and (D) plasma and liver proteins. 
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values derived from the data on plasma and liver pro- 
teins in [9] (Table 3). the present data are found to result 
in larger variances for the values of both p/ discrepancies 
and calculated charge at the experimental p/ value when 
no information on posttranslational modification is 
taken into consideration. Correction for possible .V-acety- 
lation of 12 polypeptides with M. S and A as A-terminal 
results in a smaller variance of p/ discrepancies, al- 
though not significantly different from values derived 
from (91, whereas the variance of the calculated charge at 
the experimental p/ value is significantly higher. For the 
18 selected proteins the variance for the pi discrepancies 
is significantly smaller than for the data in [9J: however, 
the corresponding value for calculated charge at the 
experimental p/ value does not improve to the same 
extent. This, we believe, reflects another difference 
between the two sets of proteins used for the calcula- 
tions. Based on spot distributions in 2-D gel maps, the 
set of proteins used here has a molecular weight distri- 
bution that is more representative of the patterns ob- 
served in mammalian cells. In the study by Bjeliqvisx 
etal [9] most of the high molecular weight plasma pro- 
teins had to be excluded due to their unknown content 
of sialic acid which made the proteins analyzed in this 
study heavily biased towards low molecular weight pro- 
teins. The buffer capacity of proteins normally increases 
with the protein's molecular weight, and the average 
buffer capacity of the presently selected proteins with 
assumed known .V-iermmals is 18 charge uniis/pH unit, 
while the corresponding value for the proteins used in 
[9] is only 9 charge uniis/pH unit. High buffer capacity 
can be expected to improve the agreement between cal- 
culated and experimental p/ values. Inspection of the 
data presented in Table 2 for the polypeptides with 
assumed known A-terminals verifies the importance of 
the butfer capacity. For 8 polypeptides having buffer 
capacities higher than 15 charge uniis/pH unit, the calcu- 
lations in all cases yielded p/ discrepancies with absolute 
values of less than 0.02 pH units. The largest discre- 
pancy. 0.06 pH units, was observed for annexin II and 
stathmin. proteins which have low buffer capacity: 0.9 
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and 6.6 charge units/pH unit, respectively. The proba- 
bility that the focusing position of a protein with known 
composition will fall within a certain distance from the 
calculated p/ value therefore cannot be predicted by the 
variance alone. The buffer capacity of the specific protein 
must be taken into consideration as well. As indicated 
by the decrease of the variance of calculated charges at 
the experimental p/ value for the selected proteins, the 
observed improvement can not solely be due to the 
higher buffer capacity of the keratinocyte proteins. The 
two studies relate to different experimental conditions. 
Good agreement between experimental and calculated 
p/ values implies that the proteins are defolded and a 
factor that may contribute to the observed improvement 
is a more complete defolding of proteins caused by the 
higher temperature and urea concentration used in this 
study. 

The data indicated that the precision with which p/ 
values can be predicted for polypeptides with high buffer 
capacity is better than the precision with which experi- 
mental p/ values can be determined. If the pH is defined 
through the pA' values of the immobilized groups in the 
IPG containing gel. the precision of the experimentally 
calculated data will depend on the pH difference 
between the p/ and the pA' value of the immobilized 
group with the closest pA\ For the present study this will 
give p/ determinations with a precision varying in the 
range of ± 0.02-0.05 pH units [9]. The good agreement 
observed between the calculated and experimental p/ 
values is due to the fact that errors are mainly system- 
atic and. as discussed in [9], they will largely be cancelled 
out in the calculations. A pH scale defined through the 
presently determined p/ values will not necessarily 
reflect the variation of the hydrogen ion activity during 
the focusing step in an optimal way. but it still allows 
precise predictions of focusing positions for polypeptides 
with known compositions, including information on 
posttranslational modifications. Calculated net charge at 
the experimentally found isoelectric point defined in this 
scale will serve as a tool to verify that the polypeptide 
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composition used in the calculation is correct and com- 
plete. Exceptions to this are proteins such as involucrin 
and heat shock protein 90 that have very high buffer 
capacities. Introduction of an extra charge unit into 
these proteins will only result in p/ shifts falling in the 
range of 0.01-0.02 pH units and the effect is that the 
quality of the pH definition - the precision by which pA' 
values used in the calculations are given and the preci- 
sion of experimental p/ values in these cases - will limit 
the possibilities to verify polypeptide compostion based 
on the experimental p/ value. 

Statistical comparison of experimental and calculated p/ 
values was done using the Mesi for dependent samples 
and normality of the discrepancies was estimated by 
probability plots. For the 36 proteins, the p-level is 
0.0021, indicating that a result like this is unlikely to 
be a chance effect and must be assumed to represent a 
real difference. After correction for the most like i > 
A'-terminal configuration, the p-level is 0.043 and cannot 
be accepted as representing the same population since 
the p-level is less than 0.05 - the traditional p-limit of 
statistical significance. For the 18 proteins with a known 
or very likely A'-terminal configuration the Mesi gave a 
p-level of 0.49. which verifies that the experimental and 
calculated p/ values are not significantly different. 

Besides showing that p/ values for denatured proteins 
with known compositions can be calculated with a high 
degree of precision from average pA' values, the results 
also provide strong support for the notion that 
A'-terminal blockage heavily depends on the nature of 
the A-terminal groups [26]. The results seem to indicate 
that with A-terminals other than M. S and A. only a few 
proteins have blocked A'-terminals (1 out of 10 proteins 
in the present study), while it can be inferred from the 
data presented in Table 2 that a majority of the proteins 
with M. S and A as A'-terminal are blocked. After correc- 
tion for the effect of suspected A'-terminal blockage 
there is only one protein (nucleolar protein B23) out of 
the 36 used in this study, which, in spite of a high buffer 
capacity, has a marked difference of 0.11 pH units 
between predicted and determined p/ values (Fig. 4B); 
this corresponds to 3 charge units due to the high buffer 
capacity of this protein. This discrepancy in p/ prediction 
and calculation of net charge at the p/ is probably not 
due to deficiencies in the database information but 
instead reflects a shortcoming of the model used for p/ 
calculations. Nucleolar protein B23 contains a domain 
extremely rich in aspartic and glutamic acid residues 
(Table 4). in which 26 out of 28 amino acid residues 
from position 161 to 188 are either a D or an E. A calcu- 
lation based on the use of average pA' values unin- 
fluenced by the charged neighboring amino acid resi- 
dues cannot be expected to correctly describe the p/ 
value with almost half of the acidic groups packed 



Table 4. Amino acid sequence of nucleolar phosphoprotein B23 




together into a highly negatively charged recior. Tr.:> 
limitation caused by calculations based on average r.\ 
values does not severely limit the usefulness o: :r.j 
approach since a search through Swiss-Proi shows :r._: 
this type of D/E-rich motif is uncommon, and the e\:>- 
tence of a highly charged region is immediately apparcr.: 
upon inspection of the amino acid sequence. 

The quality of the information available in databases, 
especially concerning posttranslational modifications, is 
a major problem when the data is to be used for p/ pre- 
dictions. The p-level of 0.043 found for all 36 proteins 
after correction for A'-acetylation. shows that this prob- 
lem is not only limited to A'-terminal blockade and the 
very good agreement found lor the eighteen polypep- 
tides, with assumingly correctly described \-ierminul 
(Fig. 4C). must be regarded as an exception from this 
point of view. A'-Terminal blockage is generally the main 
problem in relation to p/ predictions for eukaryotic pro- 
teins. Of the 36 keratinocyte proteins analyzed. 18-20 
are suspected to be A'-terminally blocked (6 proteins blo- 
cked according to Swiss-Prot. 12 proteins with M. S or A 
as A'-terminal and assumingly blocked based on the cal- 
culated charge, and two proteins, involucrin and 
nucleolar protein B23. with M as A'-terminal for which 
the data does not allow any conclusion). This is in rea- 
sonable agreement with the conclusions based on the 
A'-terminal sequencing data derived in connection with 
2-D get electrophoresis. A'-terminal blockage can be sus- 
pected for 17-19 of the 26 proteins with M. S or A as 
A'-terminal. while only 1 in 10 proteins with other 
A'-terminal groups are blocked. The information that the 
frequency of A'-terminal blockage is strongly related to 
the nature of the A'-terminal group will be of some help 
in connection with p/ predictions based on database 
information. However, without information from other 
sources, an uncertainty will always remain as to whether 
the A'-terminal charge should be included in the p/ calcu- 
lation. 



4 Concluding remarks 

The data presented here lays the foundation for com- 
paring 2-D gei protein maps of different cell types gener- 
ated with nonlinear, wide-range IPGs in the first dimen- 
sion. The focusing positions of 41 polypeptides common 
to most human cell types have been described in a pH 
scale that allows focusing positions to be predicted with 
a high degree of accuracy, provided that the composition 
of the polypeptides are known and that information on 
posttranslational modifications are available. For poly- 
peptides with a very high buffer capacity, the limiting 
factor is the precision with which experimental pH 
values can be determined rather than the precision of 
the calculations. Possible deficiencies in the pH scale 
description of the variation of the hydrogen ion activity 
has. at least at the present state, no consequences for its 
practical use. The major limitation in connection with 
predictions of focusing positions from polypeptide com- 
positions is the quality of existing data on protein com- 
positions, especially concerning posttranslational modifi- 
cations. Amino acid sequences have been reasonably 
easy to obtain, while posttranslational modifications 
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have been difficult and work-iniensive to determine. 
Recent developments in the field of mass spectrometry 
are fast changing this situation and within the next years 
we can expect a suTge in reliable data in this area. While 
awaiting this development, verification of correctness 
and completeness of available information on polypep- 
tide composition can be provided by experimental p/ 
values in a pH scale based on the p/ values determined 
in this study. So far. our data cover the pH range below 
pH * 7.5. The basic pH range covered by NEPHGE as 
first dimension will be covered in forthcoming work. 

Received December 29. 1993 
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Large Scale Biology Corporation is the leader in the integrated discovery, production 
and application of proteins - the functional units of all biological processes. 

Large Scale Biology Corporation (LSB, Vacaville, CA) and its subsidiary Large Scale 
Proteomics Corp. (LSP, Germantown, MD) are a biotechnology enterprise with the mission of 
accelerating the speed and productivity of the life sciences industry product discovery and 
development programs. Unique among biotechnology companies is LSB's integration of 
technologies to discover, analyze, manufacture and find new applications for proteins - the 
functional units of all biological processes. 

Genomics companies have focused on deciphering genetic information, providing an initial but 
only partial understanding of biological processes. LSB's proprietary protein technologies can 
enable the transformation of genomic information into products such as drug targets, 
therapeutics, diagnostics for drug efficacy and toxicity, and traits for agricultural crops. Large 
Scale Biology has gone beyond the "genomics" realm in its business model and developed 
ways to integrate the discovery of gene function with quantitative protein analysis and protein 
manufacturing. This integration of technology platforms favorably positions LSB as a leading 
provider of valuable content to industry leaders in the fields of diagnostics, therapeutics, 
vaccines and agribusiness. 

LSB was founded in 1987 with the goal of commercializing its proprietary GENEWARE viral 
vector system - a novel technology for gene expression. Using safe RNA viruses to transiently 
express genes in non-recombinant plants, LSB has positioned itself in the industry to provide 
cost-effective manufacturing and purification of diverse protein and peptide products. The 
same technology can be applied to the expression of libraries of foreign genes in an 
automated, high-throughput format to discover the function of genes with unparalleled 
efficiency. The GENEWARE system and associated proprietary technologies form the basis 
for LSB's functional genomics, biomanufacturing and a variety of proprietary products under 
development. 

From its foundation, LSB understood the need to integrate functional genomic and protein 
manufacturing expertise with quantitative protein analysis and informatics to become a 
world-leader in the protein field. In 1999, LSB acquired a privately held pharmaceutical 
proteomics company originally founded in 1985. Large Scale Proteomics Corporation (a wholly 
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owned subsidiary of Large Scale Biology Corporation) is an industry leader in identifying and 
characterizing proteins in all types of biological samples for the discovery and development of 
new and more effective therapies, diagnostics, and agricultural products. 

"Proteomics" is the study of the entire complement of proteins expressed in a cell, tissue, or 
organism. Proteomics can significantly improve drug discovery and development because 
most illness is associated with imbalances among, or malfunctions of, proteins. Only a small 
fraction of diseases can be attributed to the presence of a defective gene. Unlike classical 
genomics approaches that discover genes that may relate to a disease, LSP has developed a 
proprietary system called the ProGEx module for directly characterizing proteins associated 
with disease. Using this same technology, LSP can characterize the effects of candidate drugs 
intended to reverse a disease process, and to determine the degree to which this objective is 
achieved free of adverse side effects. 

LSB and LSP have protected their many discoveries though an extensive portfolio of domestic 
and foreign patents and have developed commercial alliances and partnerships to exploit the 
value of their technologies. LSB and LSP scientists and engineers focus on the development 
and application of resources to help clients meet their objectives as well as the development of 
our own proprietary products for subsequent partnering with industry leaders. 

A combined staff of 140 professionals operates from three locations in the United States, with 
a network of collaborators and affiliates throughout the US and Europe. Company 
headquarters, R&D laboratories and its Genomics division are located in Vacaville, California 
about 60 miles northeast of San Francisco. Process development and biomanufacturing take 
place in Owensboro, Kentucky, and LSB's Large Scale Proteomics Corporation subsidiary is 
located in Germantown, Maryland. 

In August, 2000, LSB completed an initial public offering (IPO) of 5 million shares of common 
stock and now trades on the NASDAQ under the symbol LSBC. 

Leadership - Large Scale Biology Corporation 

Robert L Erwin, Chairman of the Board and Chief Executive Officer, founded LSB™ and has 
served as a director and officer since 1987. Mr. Erwin is the former chairman of the State of 
California Breast Cancer Research Council and currently serves on the University of California 
President's Engineering Advisory Council. He is Chairman of the Supervisory Board of Icon 
Genetics AG. As a co-founder of Sungene Technologies Corp., Mr. Erwin served as Vice 
President of Research and Product Development from 1981 through 1986. He has served on 
the Biotechnology Industry Advisory Board for Iowa State University. Mr. Erwin received his 
M.S. degree in Genetics from Louisiana State University and is an inventor on several LSB 
patents. 

David R. McGee, Ph.D.,a co-founder of LSB and Senior Vice President and Chief Operating 
Officer, has been an officer since 1987. Prior to joining LSB, Dr. McGee was Vice President of 
Operations at Sungene Technologies Corporation from 1983 to 1987. Dr. McGee received his 
Ph.D. in Genetics from Louisiana State University and served as a faculty instructor of zoology 
and genetics at Louisiana State University. 

Laurence K. Grill, Ph.D.,a co-founder of LSB and Senior Vice President, Research and 
Development, has served as an officer since 1987. Dr. Grill was the Manager of Plant 
Molecular Biology for Sandoz Crop Protection Corp. from 1984 to 1987 and Senior Research 
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Scientist in the Department of Molecular Biology at Zoecon Research Institute from 1980 to 
1984. He received his Ph.D. from the University of California at Riverside with an emphasis on 
the molecular basis for viral gene expression in plants. 

ft Barry Holtz, Ph. D., Senior Vice President, Biopharmaceutical Manufacturing, has served 
the company as an officer since 1989 upon the acquisition of Holtz Bio-Engineering, which 
was founded in 1980. Dr. Holtz was a co-founder and Director of Research for MFI, Inc., the 
largest manufacturer of microencapsulated nutrients for agriculture and Director of 
Fundamental Research at Foremost-McKesson, Inc. Dr. Holtz received his Ph.D. in 
Biochemistry from Pennsylvania State University and served as Assistant Professor in the 
Department of Food Science and Nutrition at Ohio State University. 

Daniel Tuse, Ph.D., has been an officer of LSB since he joined the Company in 1995 as Vice 
President, Pharmaceutical Development. Dr. Tuse manages the company's pharmaceutical 
design and development programs, including LSB's novel vaccines and immunotherapeutics 
initiatives. Prior to joining LSB, Dr. Tuse was Assistant Director of SRI International's (Menlo 
Park, Calif.) Life Sciences Division. In his 17 years at SRI, Dr. Tuse developed extensive R&D 
experience in pharmaceuticals and specialty chemicals, serving an international list of clients. 
Dr. Tuse received his Ph.D. in Microbiology (1980, cum laude) with a minor in Toxicology from 
the University of California, Davis. 

John S. Rakitan, a co-founder of LSB, Senior Vice President & General Counsel and 
Secretary, has served as an officer since 1988. Prior to joining LSB, Mr. Rakitan was an 
attorney in private practice. Mr. Rakitan received his J.D. degree from the University of Notre 
Dame. 

Michael D. Centron, Treasurer, has served as Controller since 1988 and was elected as 
Treasurer in 1991 . Mr. Centron was Audit Supervisor for Varian Associates from June 1985 
through July 1988, and he also worked for Arthur Young and Co. (currently Ernst & Young). 
Mr. Centron is a certified public accountant and received his M.B.A. degree from the University 
of California at Berkeley. 

Guy della-Cioppa, Ph.D., is an officer of the company and currently serves as Vice President, 
Genomics. Prior to joining the company in 1989, Dr. della-Cioppa worked for Monsanto 
Company in St. Louis, MO from 1984-1989 and was an NIH Postdoctoral Fellow at the 
Worcester Foundation for Experimental Biology in Shrewsbury, MA from 1983-1984. He 
received his Ph.D. in Biology from the University of California, Los Angeles. 

William M. Pfann joined Large Scale Biology in August 2000 as Senior Vice President Finance 
and Chief Financial Officer. Mr. Pfann was formerly with PricewaterhouseCoopers LLP from 
1969 to July 2000, most recently as the Risk Management Partner for the Western Region. He 
served in a number of management roles at PwC, including leader of the firm's Silicon Valley 
audit practice, National Director of the networking and communications sector and Managing 
Partner of the Northern California emerging business group, as well as Partner-in-Charge of 
the Oakland and Walnut Creek, California offices. Mr. Pfann received a B.S. degree from the 
University of California, Berkeley, in Business Administration and an MBA in Accounting from 
Golden Gate University. 
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Large Seal Pr t omics Corporation 
Leadership - Large Scale Proteomics Corporation 

N. Leigh Anderson, Ph D., Chairman, President and CEO of Large Scale Proteomics 
Corporation (LSP™). Dr. Anderson obtained his B.A. in Physics with honors from Yale and a 
Ph.D. in Molecular Biology from Cambridge University (England) working with M. F. Perutz as 
a Churchill Fellow at the MRC Laboratory of Molecular Biology. Subsequently he co-founded 
the Molecular Anatomy Program at the Argonne National Laboratory (Chicago) where his 
work in the development of 2-dimensional electrophoresis (2-DE) and molecular database 
technology earned him, among other distinctions, the American Association for Clinical 
Chemistry's Young Investigator Award for 1982 and the 1983 Pittsburgh Analytical Chemistry 
Award. In 1985 Dr. Anderson co-founded LSP (originally Large Scale Biology Corp., 
Germantown, MD) in order to pursue commercial development and large-scale applications 
of 2-D electrophoretic protein mapping technology. 

Norman G. Anderson, Ph.D., Chief Scientist at LSP. Dr. Anderson has a distinguished record 
as an inventor. His career includes senior positions at Oak Ridge and Argonne National 
Laboratories (ORNL and ANL), more than 300 scientific publications, and the receipt of more 
than 20 prestigious awards in recognition of his work in science and technology. For his 
invention of the zonal ultracentrifuge, he received the John Scott Medal Award, and for the 
centrifugal fast analyzer, the Preis Biochemische Analytik fur Klinische Chemie from Die 
Deutsche Gesellschaft fur Klinische Chemie for the most outstanding analytical development 
in clinical chemistry worldwide during a 2-year period. In 1984 ANL awarded him its career 
patent leader award for the largest number of patents issued to an employee. At that time the 
commercial value of his inventions in terms of U.S. sales and royalties from foreign licensing 
were $250 million and $1 million, respectively. Dr. Anderson received his degrees at Duke 
University: a B.A. in Zoology, M.A. in Physiology, and Ph.D. in Cell Physiology. He holds 28 
patents. 

Constance Seniff,V\ce President, Operations. Ms. Seniff has managed LSP's operations 
since 1993. Her background includes thirteen years in international business prior to joining 
LSP, five abroad in the employ of foreign firms. Ms. Seniff is responsible for helping 
formulate and implement business development and database commercialization strategies 
for LSP in coordination with the management of LSP's parent company, Large Scale Biology 
Corporation. Ms. Seniff has a B.Sc. degree in Business (with honors) from Florida State 
University. 

Robert J. Walden, Vice President, Finance at LSP. Mr. Walden joined LSP in 1997 and has 
served as a director since 1999. He previously served as Vice President of Finance and 
Administration at Osiris Therapeutics, Inc., and as Chief Financial Officer at the American 
Type Culture Collection (ATCC). Mr. Walden received his degree in Finance from the 
University of Maryland. 

Jean-Paul Hofmann, Ph.D.,V\ce President, Software Development at LSP. Dr. Hofmann is a 
plant geneticist by training, having earned a B.S. in Biology, M.S. in Biochemistry and 
Genetics, and Ph.D. in Plant Genetics from the University of Orsay, Paris. He has extensive 
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experience in using 2-DE in agronomic research and in designing analytical software for 1- 
and 2-D applications. He has held senior scientific positions in industry and research 
institutes, in the U.S., France and the Ivory Coast. 

John Taylor, Pfr.D., Vice President, Software Development and Bioinformatics. Dr. Taylor is 
the principal developer of Kepler™, LSP's analytical software for automated 2-DE pattern 
analysis. Prior to joining LSB, Dr. Taylor served as computer scientist in the Molecular 
Anatomy Program at Argonne, and on the research staffs of the University of Chicago and 
the Armed Forces Institute of Pathology in Washington, D.C. Dr. Taylor received a B.S. in 
Physics from the University of South Carolina, and a Ph.D. in Nuclear Physics from Duke 
University. 

Sandra Steiner, Ph.D., currently serves as Vice President Proteomics Applications. Prior to 
joining the Company, Dr. Steiner founded and directed the Molecular Toxicology Group at 
Novartis in Basel, Switzerland and was a member in several multi-disciplinary drug 
development project teams. Dr. Steiner received her Ph.D. in Toxicology/Pharmacology from 
the University of Basel, Switzerland. 
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